Proceedings of the Fifteenth International Conference on

Proceedings of the Fifteenth International Conference on
San Francisco
September 10-12, 2009
of the
Conference on
DMS 2009
The 15 International Conference on
Distributed Multimedia Systems
Co-Sponsored by
Knowledge Systems Institute Graduate School, USA
Eco Controllo SpA, Italy
University of Salerno, Italy
University Ca' Foscari in Venice, Italy
Technical Program
September 10 - 12, 2009
Hotel Sofitel, Redwood City, San Francisco Bay, USA
Organized by
Knowledge Systems Institute Graduate School
Copyright © 2009 by Knowledge Systems Institute Graduate School
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or
otherwise, without the prior written consent of the publisher.
ISBN 1-891706-25-X (paper)
Additional Copies can be ordered from:
Knowledge Systems Institute Graduate School
3420 Main Street
Skokie, IL 60076, USA
Email:[email protected]
Proceedings preparation, editing and printing are co-sponsored by
Knowledge Systems Institute Graduate School, USA
Eco Controllo SpA, Italy
Printed by Knowledge Systems Institute Graduate School
DMS 2009 Foreword
Welcome to DMS 2009, the 15th edition of the International Conference on Distributed Multimedia
Systems. In past years, the DMS series of Conferences has approached the broad field of distributed
multimedia systems from several complementary perspectives: theory, methodology, technology,
systems and applications. The contributions of highly qualified authors from academy and industry,
in the form of research papers, case studies, technical discussions and presentation of ongoing
research, collected in this proceedings volume, offer a picture of current research and trends in the
dynamic fields of information technology.
The main conference themes have been organized, according to a formula consolidated during past
editions, into a number of thematic tracks offering to the conference attendants and to readers a
convenient way to explore this vast amount of knowledge in an organized way. Two additional
workshops which extended the main conference offerings, and completed the conference program
(the International Workshop on Distance Education Technologies, DET 2009, and the International
Workshop on Visual Languages and Computing, VLC 2009, are included here for reference.
The selection of the papers to be presented at the DMS conference this year, and to the two
workshops, was based upon a rigorous review process, with an acceptance rate of about 40% of
submissions received in the category of full research papers. Short papers reporting ongoing
research activities and applications completed the conference content, playing the role of fostering
timely discussions among the participants, not only on consolidated research achievements, but also
on ongoing ideas and experiments.
Twenty-three countries are represented this year: Austria, Brazil, Canada, China, Czech Republic,
France, Germany, India, Italy, Japan, Jordan, Lebanon, Malaysia, Myanmar, New Zealand,
Portugal, Spain, Sweden, Switzerland, Taiwan, United Kingdom, United States, and Vietnam,
giving a truly “distributed” atmosphere to the conference itself.
As program co-chairs, we appreciate having the opportunity to bring out this new edition of
proceedings. We acknowledge the effort of the program committee members in reviewing the
submitted papers under very strict deadlines, and the valuable advice of the conference chairs
Masahito Hirakawa and Erland Jungert. Daniel Li has given excellent support by promptly replying
to our requests for information about organization and technical issues. The excellent guidance of
Dr. S.K. Chang has led to the success of this whole process, and we take this opportunity to thank
him once again.
Finally, we thank Eco Controllo SpA, Italy, for sponsoring in part the printing of the Proceedings,
the University of Salerno, Italy for sponsoring the keynote by Gennady Andrienko, and the
Computer Science Department of Università Ca’ Foscari in Venice, Italy, for the financial support
of one of the program co-chairs.
Augusto Celentano and Atsuo Yoshitaka
DMS 2009 Program Co-Chairs
The 15th International Conference on
Distributed Multimedia Systems
(DMS 2009)
September 10-12, 2009
Hotel Sofitel, Redwood City, San Francisco Bay, USA
Organizers & Committees
Steering Committee Chair
Shi-Kuo Chang, University of Pittsburgh, USA
Conference Co-Chairs
Masahito Hirakawa, Shimane University, Japan
Erland Jungert, Linkoping University, Sweden
Program Co-Chairs
Augusto Celentano, Universita Ca Foscari di Venezia, Italy
Atsuo Yoshitaka, JAIST, Japan
Program Committee
Vasu Alagar, Concordia University, Canada
Frederic Andres, National Institute of Informatics, Japan
Arvind K. Bansal, Kent State University, USA
Ioan Marius Bilasco, Laboratoire dInformatique de Grenoble (LIG), France
Yeim-Kuan Chang, National Cheng Kung University, Taiwan
Ing-Ray Chen, Virginia Tech (VPI&SU), USA
Shu-Ching Chen, Florida International University, USA
Cheng-Chung Chu, Tunghai University, Taiwan
Gennaro Costagliola, Univ of Salerno, Italy
Alfredo Cuzzocrea, University of Calabria, Italy
Andrea De Lucia, Univ. of Salerno, Italy
Alberto Del Bimbo, Univ. of Florence, Italy
David H. C. Du, Univ. of Minnesota, USA
Jean-Luc Dugelay, Institut EURECOM, France
Larbi Esmahi, National Research Council of Canada, Canada
Ming-Whei Feng, Institute for Information Industry, Taiwan
Daniela Fogli, Universita degli Studi di Brescia, Italy
Farshad Fotouhi, Wayne State University, USA
Alexandre Francois, Tufts University
Kaori Fujinami, Tokyo University of Agriculture and Technology, Japan
Moncef Gabbouj, Tampere University of Technology, Finland
Ombretta Gaggi, Univ. of Padova, Italy
Richard Gobel, FH Hof, Germany
Stefan Goebel, ZGDV Darmstadt, Germany
Forouzan Golshani, Wright State University, USA
Jivesh Govil, Cisco Systems Inc., USA
Angela Guercio, Kent State University, USA
Niklas Hallberg, FOI, Sweden
Hewijin Christine Jiau, National Cheng Kung University, Taiwan
Joemon Jose, University of Glasgow, UK
Wolfgang Klas, University of Vienna, Austria
Yau-Hwang Kuo, National Cheng Kung University, Taiwan
Jen Juan Li, North Dakota State University, USA
Fuhua Lin, Athabasca University, Canada
Alan Liu, National Chung Cheng Univeristy, Taiwan
Chien-Tsai Liu, Taipei Medical College, Taiwan
Chung-Fan Liu, Kun Shan University, Taiwan
Jonathan Liu, University of Florida, USA
Andrea Marcante, Universita degli Studi di Milano, Italy
Sergio Di Martino, Universita degli Studi di Napoli Federico II, Italy
Piero Mussio, Universita degli Studi di Milano, Italy
Paolo Nesi, University of Florence, Italy
Vincent Oria, New Jersey Institute of Technology, USA
Sethuraman Panchanathan, Arizona State Univ., USA
Antonio Piccinno, Univ. of Bari, Italy
Sofie Pilemalm, FOI, Sweden
Fabio Pittarello, University of Venice, Italy
Giuseppe Polese, University of Salerno, Italy
Syed M. Rahman, Minnesota State University, USA
Monica Sebillo, Universita di Salerno, Italy
Timothy K. Shih, National Taipei University of Education, Taiwan
Peter Stanchev, Kettering University, USA
Genny Tortora, University of Salerno, Italy
Joseph E. Urban, Texas Tech University, USA
Athena Vakali, Aristotle University, Greece
Ellen Walker, Hiram College, USA
KaiYu Wan, East China Normal University, China
Chi-Lu Yang, Institute for Information Industry, Taiwan
Kang Zhang, University of Texas at Dallas, USA
Publicity Co-Chairs
KaiYu Wan, East China Normal University, China
Chi-Lu Yang, Institute for Information Industry, Taiwan
Proceedings Cover Design
Gabriel Smith, Knowledge Systems Institute Graduate School, USA
Conference Secretariat
Judy Pan, Chair, Knowledge Systems Institute Graduate School, USA
Omasan Etuwewe, Knowledge Systems Institute Graduate School, USA
Dennis Chi, Knowledge Systems Institute Graduate School, USA
David Huang, Knowledge Systems Institute Graduate School, USA
Daniel Li, Knowledge Systems Institute Graduate School, USA
International Workshop on
Distance Education Technologies
(DET 2009)
September 10-12, 2009
Hotel Sofitel, Redwood City, San Francisco Bay, USA
Organizers & Committees
Workshop Co-Chairs
Tim Arndt, Cleveland State University, USA
Heng-Shuen Chen, National Taiwan University, Taiwan
Program Co-Chairs
Paolo Maresca, University Federico II, Napoli, Italy
Qun Jin, Waseda University, Japan
Program Committee
Giovanni Adorni, University of Genova, Italy
Tim Arndt, Cleveland State University, USA
Heng-Shuen Chen, National Taiwan University, Taiwan
Yuan-Sun Chu, National Chung Cheng University, Taiwan
Luigi Colazzo, University di Trento, Italy
Rita Francese, University of Salerno, Italy
Wu He, Old Dominion University, USA
Pedro Isaias, Open University, Portugal
Qun Jin, Waseda University, Japan
Paolo Maresca, University Federico II, Napoli, Italy
Syed M. Rahman, Minnesota State University, USA
Teresa Roselli, University of Bari, Italy
Nicoletta Sala, University of Italian Switzerland, Switzerland
Giuseppe Scanniello, University of Salerno, Italy
Hui-Kai Su, Nanhua University, Taiwan
Yu-Huei Su, National HsinChu University of Education, Taiwan
Kazuo Yana, Hosei University, Japan
International Workshop on
Visual Languages and Computing
(VLC 2009)
September 10-12, 2009
Hotel Sofitel, Redwood City, San Francisco Bay, USA
Organizers & Committees
Workshop Co-Chairs
Giuseppe Polese, University of Salerno, Italy
Giuliana Vitiello, University of Salerno, Italy
Program Chair
Gem Stapleton, University of Brighton, UK
Program Committee
Dorothea Blostein, Queen's University, Canada
Paolo Buono, University of Bari, Italy
Alfonso F. Cardenas, University of California, USA
Kendra Cooper, University of Texas at Dallas, USA
Maria Francesca Costabile, University of Bari, Italy
Gennaro Costagliola, University of Salerno, Italy
Philip Cox, Dalhousie University, Canada
Vincenzo Deufemia, University of Salerno, Italy
Stephan Diehl, University of Trier, Germany
Jing Dong, The University of Texas at Dallas, USA
Filomena Ferrucci, University of Salerno, Italy
Andrew Fish, University of Brighton, UK
Paul Fishwick, University of Florida, USA
Manuel J. Fonseca, INESC-ID, Portugal
Dorian Gorgan, Technical University of Cluj-Napoca, Romania
Corin Gurr, University of Reading, UK
Tracy Hammond, Texas A&M University, USA
Maolin Huang, University of Technology, Sydney, Australia
Erland Jungert, Linkoping University, Sweden
Lars Knipping, Technische Universitat Berlin, Germany
Hideki Koike, University of Electro-Communications Tokyo, Japan
Jun Kong, North Dokota State University, USA
Zenon Kulpa, Institute of Fundamental Technological Research, Poland
Robert Laurini, University of Lyon, France
Benjamin Lok, University of Florida, USA
Kim Marriott, Monash University, Australia
Rym Mili, University of Texas at Dallas, USA
Piero Mussio, University of Milan, Italy
Luca Paolino, University of Salerno, Italy
Joseph J. Pfeiffer, New Mexico State University, USA
Beryl Plimmer, University of Auckland, New Zealand
Giuseppe Polese, University of Salerno, Italy
Steven P. Reiss, Brown University, USA
Gem Stapleton, University of Brighton, UK
David Stotts, University of North Carolina, USA
Nik Swoboda, Universidad Politecnica de Madrid, Spain
Athanasios Vasilakos, University of Western Macedonia, Greece
Giuliana Vitiello, University of Salerno, Italy
Kang Zhang, University of Texas at Dallas, USA
Table of Contents
Foreword …………………………………………………………………….................. iii
Organization ………………………………………………………………………….... v
Slow Intelligence Systems
Shi-Kuo Chang ……………………………………………………………………............
Geographic Visualization of Movement Patterns
Gennady Andrienko and Natalia Andrienko ……………….……………………............. xxv
Distributed Multimedia Systems - I
Demonstrating the Effectiveness of Sound Spatialization in Music and Therapeutic Applications
Masahito Hirakawa, Mirai Oka, Takayuki Koyama, Tetsuya Hirotomi …………………………
End-user Development in the Medical Domain
Maria Francesca Costabile, Piero Mussio, Antonio Piccinno, Carmelo Ardito, Barbara Rita
Barricelli, Rosa Lanzilotti ………………………………..……………….……………………
Multimedia Representation of Source Code and Software Model
Transformation from Web PSM to Code (S)
Yen-Chieh Huang, Chih-Ping Chu, Zhu-An Lin, Michael Matuschek ………………………….
Experiences with Visual Programming in Engineering Applications (S)
Valentin Plenk ……………….………..….............……………….…………………….......
Advantages and Limits of Diagramming (S)
Jaroslav Kral, Michal Zemlicka ……………….…………………….............………………
Distributed Multimedia Computing & Networks and Systems
PSS: A Phonetic Search System for Short Text Documents
Jerry Jiaer Zhang, Son T. Vuong ……………….……………………...................................
Hybrid Client-server Multimedia Streaming Assisted by Unreliable Peers
Samuel L. V. Mello, Elias P. Duarte Jr. ……………….……………………..........................
Visual Programming of Content Processing Grid
Pierfrancesco Bellini, Ivan Bruno, Paolo Nesi ……………….……………………................ 40
Interactive Multimedia Systems for Technology-enhanced Learning and Preservation
Kia Ng, Eleni Mikroyannidi, Bee Ong, Nicolas Esposito, David Giaretta ……………………….
Digital Home and HealthCare - I
LoCa – Towards a Context-aware Infrastructure for eHealth Applications
Nadine Frohlich, Andreas Meier, Thorsten Moller, Marco Savini, Heiko Schuldt, Joel Vogt …
An Intelligent Web-based System for Mental Disorder Treatment by Using Biofeedback Analysis
Bai-En Shie, Fong-Lin Jang, Richard Weng, Vincent S Tseng ………………………………….
Adaptive SmartMote in Wireless Ad-Hoc Sensor Network
Sheng-Tzong Cheng, Yao-Dong Zou, Ju-Hsien Chou, Jiashing Shih, Mingzoo Wu …………...
Digital Home and HealthCare - II
A RSSI-based Algorithm for Indoor Localization Using ZigBee in Wireless Sensor Network
Yu-Tso Chen, Chi-Lu Yang, Yeim-Kuan Chang, Chih-Ping Chu ……………………………….
A Personalized Service Recommendation System in a Home-care Environment
Chi-Lu Yang, Yeim-Kuan Chang, Ching-Pao Chang, Chih-Ping Chu ………………………….
Design and Implementation of OSGi-based Healthcare Box for Home Users
Bo-Ruei Cao, Chun-Kai Chuang, Je-Yi Kuo, Yaw-Huang Kuo, Jang-Pong Hsu ………………
Distributed Multimedia Systems - II
An Approach for Tagging 3D Worlds for the Net
Fabio Pittarello ……………….…………………….............……………….……………...
TA-CAMP Life: Integrating a Web and a Second Life Based Virtual Exhibition
Andrea De Lucia, Rita Francese, Ignazio Passero, Genoveffa Tortora …………………………
Genomena: a Knowledge-based System for the Valorization of Intangible Cultural Heritage
Paolo Buono, Pierpaolo Di Bitonto, Francesco Di Tria, Vito Leonardo Plantamura …………..
Technologies for Digital Television
Video Quality Issues for Mobile Television
Carlos D. M. Regis, Daniel C. Morais, Raissa Rocha, Marcelo S. Alencar, Mylene C. Q. Farias
Comparing the "Eco Controllo"'s Video Codec with Respect to MPEG4 and H264
Claudio Cappelli ……………….…………………….............……………….……………..
An Experimental Evaluation of the Mobile Channel Performance of the Brazilian Digital
Television System
Carlos D. M. Regis, Marcelo S. Alencar, Jean Felipe F. de Oliveira ……………………………
Emergency Management and Security
Decision Support for Monitoring the Status of Individuals
Fredrik Lantz, Dennis Andersson, Erland Jungert, Britta Levin ………………………………... 123
Assessment of IT Security in Emergency Management Information Systems (S)
Johan Bengtsson, Jonas Hallberg, Thomas Sundmark, Niklas Hallberg ……………………….
Practical Experiences in Using Heterogeneous Wireless Networks for Emergency Response
Services (S)
Miguel A. Sanchis, Juan A. Martinez, Pedro M. Ruiz, Antonio F. Gomez-Skarmeta, Francisco
Rojo ……………….…………………….............……………….……………………........
F-REX: Event Driven Synchronized Multimedia Model Visualization (S)
Dennis Andersson ……………….…………………….............……………….…………...
Towards Integration of Different Media in a Service-oriented Architecture for Crisis
Management (S)
Magnus Ingmarsson, Henrik Eriksson, Niklas Hallberg ………………………………………...
Distributed Multimedia Systems - III
An Analysis of Two Cooperative Caching Techniques for Streaming Media in Residential
Neighborhoods (S)
Shahram Ghandeharizadeh, Shahin Shayandeh, Yasser Altowim ………………………………
PopCon Monitoring: Web Application for Detailed Real-time Database Transaction Monitoring
Ignas Butenas, Salvatore Di Guida, Michele de Gruttola, Vincenzo Innocente, Antonio Pierro .
Distributed Multimedia Systems - IV
Using MPEG-21 to Repurpose, Distribute and Protect News/NewsML Information
Pierfrancesco Bellini, Ivan Bruno, Paolo Nesi …………………………………………………...
Activity-oriented Web Page Retrieval by Reflecting Human Traffic in the Real World
Atsuo Yoshitaka, Noriyoshi Kanki, Tsukasa Hirashima …………………………………………
An Architecture for User-centric Identity, Profiling and Reputation Services (S)
Gennaro Costagliola, Rosario Esposito, Vittorio Fuccella, Francesco Gioviale ………………...
Distributed Multimedia Systems - V
The ENVISION Project: Towards a Visual Tool to Support Schema Evolution in Distributed
Giuseppe Polese, Mario Vacca …………………….……………….…………………….........
Towards Synchronization of a Distributed Orchestra (S)
Angela Guercio, Timothy Arndt …………………………………………………………………...
Semantic Composition of Web Services (S)
Manuel Bernal Llinares, Antonio Ruiz Martinez, MA Antonia Martinez Carreras, Antonio F.
Gomez Skarmeta ……………….…………………….............……………….…………….. 186
DET Workshop
Eclipse and Jazz Technologies for E-learning
Eclipse: a New Way to Mashup
Paolo Maresca, Giuseppe Marco Scarfogliero, Lidia Stanganelli ……………………………….
Mashup Learning and Learning Communities
Luigi Colazzo, Andrea Molinari, Paolo Maresca, Lidia Stanganelli …………………………….
J-META: a Language to Describe Software in Eclipse Community
Pierpaolo Di Bitonto, Paolo Maresca, Teresa Roselli, Veronica Rossano, Lidia Stanganelli …..
Providing Instructional Guidance with IMS-LD in COALA, an ITS for Computer Programming
Learning (s)
Francisco Jurado, Miguel A. Redondo, Manuel Ortega …………………………………………
Learning Objects: Methodologies, Technologies and Experiences
Deriving Adaptive Fuzzy Learner Models for Learning-object Recommendation
G. Castellano, C. Castiello, D. Dell'Agnello, C. Mencar, M.A. Torsello …………………………
Adaptive Learning Using SCORM Compliant Resources
Lucia Monacis, Rino Finamore, Maria Sinatra, Pierpaolo Di Bitonto, Teresa Roselli, Veronica
Rossano ……………….…………………….............……………….……………………..
Organizing the Multimedia Content of an M-Learning Service through Fedora Digital Objects
C. Ardito, R. Lanzilotti ……………….…………………….............……………….……….
Enhancing Online Learning Through Instructional Design: a Model for the Development of IDbased Authoring Tools
Giovanni Adorni, Serena Alvino, Mauro Coccoli ……………….……………………............
Learning Objects Design for a Databases Course (s)
Carlo Dell'Aquila, Francesco Di Tria, Ezio Lefons, Filippo Tangorra ………………………….
E-learning and The Arts
A Study of 'Health Promotion Course for Music Performers' Distance-learning Course
Yu-Huei Su, Yaw-Jen Lin, Jer-Junn Luh, Heng-Shuen Chen …………………………………..
Understanding Art Exhibitions: from Audioguides To Multimedia Companions
Giuseppe Barbieri, Augusto Celentano, Renzo Orsini, Fabio Pittarello …………………………
A Pilot Study of e-Music School of LOHAS Seniors in Taiwan
Chao-Hsiu Lee, Yen-Ting Chen, Yu-Yuan Chang, Yaw-Jen Lin, Jer-Junn Luh, Hsin-I Chen ..
Sakai 3: A New Direction for an Open Source Academic Learning and Collaboration Platform
Michael Korcuska ……………….…………………….............……………….…………...
Concept Map Supported E-learning Implemented on Knowledge Portal Systems
Jyh-Da Wei, Tai-Yu Chen, Tsai-Yeh Tung, D. T. Lee ……………….…………………….....
An Implementation of the Tools in the Open-source Sakai Collaboration and Learning
Environment (s)
Yasushi Kodama, Tadashi Komori, Yoshikuni Harada, Yashushi Kamayashi, Yuji Tokiwa,
Kazuo Yana ……………….…………………….............……………….………………….
A 3-D Real-time Interactive Web-cast Environment for E-collaboration in Academia and
Education (s)
Billy Pham, Ivan Ho, Yoshiyuki Hino, Yasushi Kodama, Hisato Kobayashi, Kazuo Yana ……..
Applying Flow Theory to the Evaluation of the Quality of Experience in a Summer School
Program Involving E-interaction (s)
Kiyoshi Asakawa, Kazuo Yana ……………….…………………….............………………..
VLC Workshop
Visual Analytics - I
Extracting Hot Events from News Feeds, Visualization, and Insights
Zhen Huang, Alfonso F. Cardenas ……………….……………………................................. 287
Visual Analysis of Spatial Data through Maps of Chorems
Davide De Chiara, Vincenzo Del Fatto, Robert Laurini, Monica Sebillo, Giuliana Vitiello ……
Software Visualization Using a Treemap-hypercube Metaphor (s)
Amaia Aguirregoitia, J. Javier Dolado, Concepcion Presedo ……………………………………
Visual Interactive Exploration of Spatio-temporal Patterns (s)
Radoslaw Rudnicki, Monika Sester, Volker Paelke ………………………………………………
Visual Languages and Environments for Software Engineering
On the Usability of Reverse Engineering Tools
F. Ferrucci, R. Oliveto, G. Tortora, G. Vitiello, S. Di Martino …………………………………..
A Methodological Framework to the Visual Design and Analysis of Real-Time Systems
Kawtar Benghazi, Miguel J. Hornos, Manuel Noguera, Maria J. Rodriguez …………………...
Visualizing Pointer-related Data Flow Interactions (s)
Marcel Karam, Marwa El-Ghali, Hiba Halabi …………………………………………………...
Visual Semantics, Tools and Layout
A Graphical Tool to Support Visual Information Extraction
Giuseppe Della Penna, Daniele Magazzeni, Sergio Orefice ……………………………………...
Rule-based Diagram Layout Using Meta Models
Sonja Maier, Mark Minas …………………………………………………………………………
Chorem Maps: towards a Legendless Cartography?
Robert Laurini, Francoise Raffort, Monica Sebillo, Genoveffa Tortora, Giuliana Vitiello …….
Sketch Computing
Preserving the Hand-drawn Appearance of Graphs
Beryl Plimmer, Helen Purchase, Hong Yu Yang, Laura Laycock ……………………………….
ReCCO: An Interactive Application for Sketching Web Comics
Ricardo Lopes, Manuel J. Fonseca, Tiago Cardoso, Nelson Silva ………………………………
Performances of Multiple-Selection Enabled Menus in Soft Keyboards
Gennaro Costagliola, Vittorio Fuccella, Michele Di Capua, Giovanni Guardi …………………
SOUSA v2.0: Automatically Generating Secure and Searchable Data Collection Studies (s)
Brandon L. Kaster, Emily R. Jacobson, Walter Moreira, Brandon Paulson, Tracy A.
Hammond …………………………………………………………………………………………..
Visual Analytics - II
Visualizing Data to Support Tracking in Food Supply Chains
Paolo Buono, Adalberto L. Simeone, Carmelo Ardito, Rosa Lanzilotti ………………………….
A Methodological Framework for Automatic Clutter Reduction in Visual Analytics
Enrico Bertini, Giuseppe Santucci ………………………………………………………………...
Reviewer's Index …………………………………………………………………………………..
Author's Index …………………………………………………………………………………….
Note: (S) means short paper.
Keynote I:
Slow Intelligence Systems
Shi-Kuo Chang
In this talk I will introduce the concept of slow intelligence. Not all intelligent systems are
fast. There are a surprisingly large number of intelligent systems, quasi-intelligent systems
and semi-intelligent systems that are slow. Such slow intelligence systems are often
neglected in mainstream research on intelligent systems, but they are really worthy of our
attention and emulation. I will discuss the general characteristics of slow intelligence
systems and then concentrate on evolutionary query processing for distributed multimedia
systems as an example of artificial slow intelligence systems.
About Shi-Kuo Chang
Dr. Chang received the B.S.E.E. degree from National Taiwan University in 1965. He
received the M.S. and Ph.D. degrees from the University of California, Berkeley, in 1967
and 1969, respectively. He was a research scientist at IBM Watson Research Center from
1969 to 1975. From 1975 to 1982 he was Associate Professor and then Professor at the
Department of Information Engineering, University of Illinois at Chicago. From 1982 to
1986 he was Professor and Chairman of the Department of Electrical and Computer
Engineering, Illinois Institute of Technology. From 1986 to 1991 he was Professor and
Chairman of the Department of Computer Science, University of Pittsburgh. He is currently
Professor and Director of the Center for Parallel, Distributed and Intelligent Systems,
University of Pittsburgh. Dr. Chang is a Fellow of IEEE. He published over 230 papers and
16 scientific books. He is the founder and co-editor-in-chief of the international journal,
Visual Languages and Computing, published by Academic Press, the editor-in-chief of the
international journal, Software Engineering & Knowledge Engineering, published by World
Scientific Press, and the co-editor-in-chief of the international journal on Distance
Education Technologies. Dr. Chang pioneered the development of Chinese language
computers, and was the first to develop a picture grammar for Chinese ideographs, and
invented the phonetic phrase Chinese input method.
Dr. Chang's literary activities include the writing of over thirty novels, collections of short
stories and essays. He is widely regarded as an acclaimed novelist in Taiwan. His novel, The
Chess King, was translated into English and German, made into a stage musical, then a TV
mini-series and a movie. It was adopted as textbook for foreign students studying Chinese at
the Stanford Center (Inter-University Program for Chinese Language Studies administered
by Stanford University), Taipei, Taiwan. In 1992, Chess King was adopted as
supplementary reading for high school students in Hong Kong. The short story, "Banana
Boat", was included in a textbook for advanced study of Chinese edited by Neal Robbins
and published by Yale University Press. University of Illinois adopted "The Amateur
Cameraman" in course materials for studying Chinese. Dr. Chang is also regarded as the
father of science fiction in Taiwan. Some of Dr. Chang's SciFi short stories have been
translated into English, such as "City of the Bronze Statue", "Love Bridge”, and "Returning”.
His SciFi novel, The City Trilogy, was published by Columbia University Press in May
Keynote II:
Geographic Visualization of Movement Patterns
Gennady Andrienko and Natalia Andrienko
We present our recent results in visualization and visual analytics of movement data. The
GeoPKDD project (Geographic Privacy-aware Knowledge Discovery and Delivery) and
the recently started DFG project ViAMoD (Visual Spatiotemporal Pattern Analysis of
Movement and Event Data) have brought into existence an array of new methods enabling
the analysis of really large collections of movement data. Some of the methods are
applicable even to data not fitting in the computer main memory. These include the
techniques for database aggregation, cluster-based classification, and incremental
summarization of trajectories. The remaining methods can deal with data that fit in the
main memory but are too big for the traditional visualization and interaction techniques.
Among these methods are interactive visual cluster analysis of trajectories and dynamic
aggregation of movement data. The visual analytics methods are based on the interplay of
computational algorithms and interactive visual interfaces, which support the involvement
of human capabilities for pattern recognition, association, interpretation, and reasoning.
The projects have also moved forward the theoretical basis for visual analytics methods for
movement data. We discuss analysis tasks and problems requiring further research.
About Gennady Andrienko
Gennady Andrienko received his Master degrees in Computer Science from Kiev State
University in 1986 and Ph.D. equivalent in Computer Science from Moscow State
University in 1992. He undertook research on knowledge-based systems at the Mathematics
Institute of Moldavian Academy of Sciences (Kishinev, Moldova), then at the Institute on
Mathematical Problems of Biology of Russian Academy of Science (Pushchino Research
Center, Russia). Since 1997 Dr. Andrienko has a research position at GMD, now Fraunhofer
Institute for intelligent Analysis- and Information Systems (IAIS). He is a co-author of the
monograph "Exploratory Analysis of Spatial and Temporal Data", 30+ peer-reviewed
journal papers, 10+ book chapters, and 100+ papers in conference proceedings. He has been
involved in numerous international research projects. His research interests include
geovisualization, information visualization with a focus on spatial and temporal data, visual
analytics, interactive knowledge discovery and data mining, spatial decision support and
International Conference on
Distributed Multimedia Systems
(DMS 2009)
Augusto Celentano, Universita Ca Foscari di Venezia, Italy
Atsuo Yoshitaka, JAIST, Japan
Demonstrating the Effectiveness of Sound Spatialization in Music and
Therapeutic Applications
Masahito Hirakawa, Mirai Oka, Takayuki Koyama1, and Tetsuya Hirotomi
Interdisciplinary Faculty of Science and Engineering, Shimane University, Japan
{hirakawa, hirotomi}
In those trials, sound patterns or notes are a matter
of concern. While they give the user a great impact in
understanding the associated events, the spatial
position of sounds influences the user’s understanding
as well [7].
Stereo and 5.1-channel surround systems which
have been used widely make it possible for the listener
to feel the sound position. It should be mentioned that,
however, the best spot for listening is fixed in those
settings. If the listener is out of the spot, a reality of the
sound space cannot be maintained any more. Due to
this fact, those systems are suitable for the application
where a limited number of listeners sit in a limited
In collaborative or multi-user computing
environments, the system should support a mechanism
that each of the users can catch where sounds are
placed, irrelevant to his/her standing position and
The authors have investigated a tabular sound
system for a couple of years [8], [9]. The system is
equipped with a meter square table in which 16
speakers are placed in a 4 x 4 grid layout. Multiple
sound streams can be presented simultaneously by
properly controlling the loudness for those speakers.
Additionally, computer generated graphical images are
projected on its surface. We call this table "Sound
Table." Users who surround the table can feel spatial
sounds with the associated images. In addition, a
special stick-type input device is provided for
specification of commands. It is important to note that
the users do not need to wear any special devices for
interacting with the system.
In this paper we present applications of the system
to sound mashup and reminiscence/life review, in
order to demonstrate the effectiveness of sound
spatialization in collaborative work environments.
Most of the existing computer systems express
information visually. While vision plays an important
role in interaction between the human and the
computer, it is not the only channel. We have been
investigating a multimedia system which is capable of
controlling the spatial position of sounds on a twodimensional table.
In this paper we present applications of the system
to sound mashup and reminiscence/life review, in
order to demonstrate the effectiveness of sound
spatialization in collaborative work environments.
Users can collaborate with each other with the help of
sound objects which are spatialized on the table, in
addition to graphical images.
1. Introduction
Multimedia is a basis of modern computers. In fact
a variety of studies have been investigated so far.
Graphical user interfaces, or visual languages in a
broader sense, are one such example toward
development of advanced computers in the early days
of multimedia research. Since humans are sensitive to
vision, it is natural that our attention had been paid to
the use of visual information in interaction between the
user and the computer.
Meanwhile, audition is another important channel
for interaction. The idea of so-called earcon [1] was
first proposed to present specific items or events by
means of abstract patterns in loudness, pitch, or timbre
of sounds. Studies of auditory interface have been
done actively in such applications as menu navigation
[2], mobile service notifications [3], [4], mobile games
[5], and human movement sonification [6].
Mr. Koyama is now with ICR, Japan.
PC. Figure 1 shows its physical setup (The PC is not
2. Related Work
Sound spatialization studies have been active in a
human-computer interaction domain [10]. One
practical example is a computer game named “Otogei”
which was produced by Bandai. The player wears a
headphone and tries to attack the approaching enemies
by relying on a stereo sound. [11] - [13] presented
sound-based guidance systems which guide a user to a
desired target location by varying the loudness and
balance of a sound played. There exist some other
approaches of using sounds for assistance of, for
example, car driving [14], mail browsing [15],
geographical map navigation [16], and object finding
in 3D environments [17].
Here, those systems assume a headphone or a
specially designed hardware as an interaction device.
A user is separated from others, and each of the users
hears a different sound even though multiple people
participate in a common session. This feature is
advantageous in some cases, but not recommended for
collaborative work environments.
[18] conducted experiments on the use of nonspeech audio at an interactive multi-user tabletop
display under two different setups. One is a localized
sound where each user has his or her own speaker, and
the other is a coded sound where users share one
speaker but waveforms of the sounds are varied so that
a different sound is played for each user. This
approach could be one practical solution to businessoriented applications, but is not sufficient for soundcentric applications (e.g. computer music).
Transition Soundings [19] and Orbophone [20] are
specialized interfaces using multiple speakers for
interactive music making. A large number of speakers
are mounted in a wall-shaped board in Transition
Soundings, while Orbophone houses multiple speakers
in a dodecahedral enclosure. Both systems are
deployed for sound art.
Other related approaches of using multi-channel
speakers appear in [21], [22]. While they provide
sophisticated functionality, their system setting is
rather complex and specialized. As will be explained in
the next section, we use conventional speakers and
sound boards and no specialized hardware is used at all.
Figure 1. Sound spatialization system
Sound Table is a physical table in which 16
speakers are equipped in a 4 x 4 matrix, as shown in
Fig. 2. It is of 90cm width and depth, and 73cm height.
Two 8-channel audio interfaces (M-AUDIO FireWire
410) are equipped to the PC, and connected to Sound
Table through a 16-channel amplifier. Multiple sounds
can be output at one time at different positions.
Figure 2. Sound Table
We have analyzed how accurate the sound
positioning is through experiments, that is, errors in
distance between the simulated sound position and the
perceived sound position. The average error of sound
position identification for moving sounds is 0.52 in
horizontal direction and 0.72 in depth direction, where
3. Tabular Sound Spatialization System
The sound spatialization system [8], [9] we have
developed as a platform for sound-based collaborative
applications is organized by Sound Table as its central
equipment, a pair of cameras, a video projector, and a
their values are normalized by the distance between
two adjacent speakers (24cm). Further details are given
in [8].
The surface of Sound Table is covered by a white
cloth so that computer-generated graphical images are
projected onto it. Multiple users can interact with the
system through both auditory and visual channels.
A stick-type input device whose base unit is
Nintendo Wii Remote is provided as shown in Fig. 3.
Position and posture of the device in a 3D space over
Sound Table are captured by the system, as well as
button press.
The trials mentioned above focus on interactive
music composition. There have been few trials
allowing the user to enjoy manipulating the spatial
position of sound sources (e.g., virtual music
performers), while it is of a great importance to people
in order to attain reality [10]. Pinocchio [28] and the
one exhibited at Sony ExploraScience museum are
examples which emphasize localization of sound.
Meanwhile, the online music software Massh! [29]
inspired us with its distinguished functionality and
interactive features. It enables users mix sound
samples or loops to make a new song (i.e., mashup).
Furthermore, its visual user interface is highly
interactive. Sound loops are graphically represented on
the screen as rotating circular waveforms. They can
form a group (i.e. mix), which are played in sync with
each other.
Sound loops are presented in Massh! as visual clues,
but no sound spatialization is available. We consider
adding a sound spatialization facility for more
attractive music mashup.
Figure 3. Sticky input device
4.2 Design policy
Several different interface designs for music
mashup in our system setting can be thought.
One possibility is that, considering music loops are
time-based media and their execution (play) is limited
to one part within the whole at a time, a music
loop/sample is represented in a form of timeline with a
slider showing which part of the music loop/sample is
being played. Multiple sliders may be assigned to one
music loop/sample, allowing the player to have a
composition that employs a melody with one or more
imitations of the melody played after a given duration,
that is, a canon.
Meanwhile, we take another approach where music
is organized by multiple moving sound objects which
correspond to sound samples or loops. While no play
position control is available for the objects, flexibility
is given to them in respect of their moving paths. This
fits well to our system architecture.
Here, in order to have variation of sounds generated,
we prepare two path patterns: a straight line and a
circular line. Multiple sound objects may be associated
with one path. When a sound object comes to a
crossing point where two or more paths are overlapped,
the object may change its path to another.
Meanwhile, a task of identifying user’s gestures
which include tap, sting and release, attack, flick, and
tilt is separated from that of interpreting their semantic
meanings in a certain application so that application
development can be made easier. We adopted the OSC
(Open Sound Control) protocol for communication of
messages among processing modules. For details of
the software development framework, please refer to
We have first implemented a simple music
application of the system in order to demonstrate its
functionality [9]. Here, in this paper, we will show
more practical applications at which the sound
spatialization facility plays a significant role.
4. Music Mashup Application
4.1 Background
In computer music, people are interested in creating
and performing music. Musical instruments which are
augmented by digital technologies have been proposed.
TENORI-ON [23] is an example. It gives a "visible
music" interface at which a 16x16 matrix of touch
sensible LED switches allows a user to play music
intuitively. Some researchers put emphasis on the
instrument part (e.g., [24], [25]), but some others
focused attention on user-interface where tactile,
gestural, or multimodal features are emphasized (e.g.,
[26], [27]).
4.3 Implementation
We have built an actual music mashup application
on the tabular sound spatialization system.
First, the user determines a path for sound object(s)
on the table by placing certain gestures as explained
For specification of a straight line, the user touches
the stick device at a starting point on the table, and
then brings it to a desirable terminal position with
keeping its head on the table. The straight line has a
handle in a triangular shape at each side of the line (see
Fig. 4). The user can change the length and angle of a
line by manipulating its handle.
the user to take a sound object to another position after
its creation. If the object is placed at the position where
no path line exists, it keeps its position and doesn’t
Figure 4. Specification of a straight line path
On the other hand, a circular line can be generated
by bringing one handle of a predefined straight line
close to the other handle, as shown in Fig. 5(a). The
user is allowed to modify the position and size of a
circular line by dragging a center marker and a special
marker on the line, respectively (see Fig. 5(b)).
Figure 6. Change of a path
Meanwhile, when the device is swung down over a
sound object which is sounding, the sound is
terminated. At the same time, its color becomes black
to see the change. If the gesture is applied again, the
object restarts sounding.
When the user places a gesture of handling a sound
object on the table with a light quick blow, it flows out
of the table with graphical effects - i.e., the object is
broken in segments. Semantically this means deletion
of the object.
Figure 7 shows a snapshot of the system in use.
Multiple users can play collaboratively each other.
Manipulation of sound objects and lines which may be
specified by other users brings a change of sounds in
real time. This notifies each user of others’ play, and
stimulates him/her to have a reaction.
Figure 5. Specification of a circular line path
Music starts by generating sound objects on the
table. Generation of a sound object is carried out by
tilting the stick device while pushing a button of the
device. Graphically, a sound object takes a circular
shape with a certain color and size. The color
corresponds to a sound sample/loop, while the size
corresponds to its loudness. The size of a sound object
is determined, when the object is instanciated, by the
position of the stick device in a 3D space on the table.
Higher the spatial position, larger the circle size and
thus louder the generated sound.
When a sound object is placed on a line, it starts
moving along the line. Users enjoy feeling the
movement of the sound. In the present implementation,
change of the path from one line to another at a
crossing point happens by a certain possibility. Figure
6 shows such examples. Furthermore, it is allowed for
Figure 7. Collaborative play with the system
Having a feeling of sound movement is attractive
and fun in such a music application realized in our trial.
Here we noticed the importance of authoring effective
content to give users better impression in their
performance. Experimental evaluation of the
usefulness of the proposed music application still
materials. A facility of recoding and analyzing
activities presented by participants will be reported
The interface needs providing a facility to place a
sound at any position on a picture and specify its
arbitrary movement on the picture as, for example, a
child runs around in a playground. The specification
should be understandable so that the user can edit it.
Here, simplicity is a matter of vital importance in its
5. Supporting Reminiscence for Older
5.1 Background
Reminiscence therapy is a psychosociological
therapeutic approach to the care of older people [30],
[31]. Older people recall various experiences from
their past life and share them with others to facilitate
pleasure, quality of life, emotional stability or
adaptation to present circumstances, and to reduce
isolation and depression.
In practice, due to a rapid increase in the elderly
population, interest in reminiscence therapy has
continued to grow. Trials have actually been carried
out in hospitals, day care, nursing homes, and other
settings, where reminiscence therapy is usually
conducted in a group guided by an experienced staff.
Meanwhile, in a reminiscence session, the staff
shows visual media such as photographs and pictures
as a clue. Some other media including music, smell,
and tactile may be used as well to make the session
successful. [32] and [33] present computer-based
multimedia conversion aids in which audio, video,
animation and/or QuickTime VR are utilized.
It is noted that, in the existing trials of reminiscence
therapy with music, songs or melodies are a matter of
concern. It is expected that the position of sounds and
its movement work considerably to help people in
recalling experiences and then initiating their speech.
5.3 Implementation
For creation of a reminiscence material, the staff
first selects a picture from a database, and then assigns
sound objects on it by manipulating the sticky device.
Each of the created sound objects is visualized in an
icon so that the staff can easily identify the position
and some other states of the object, as shown in Fig. 8.
Those states include the mobility (moving object or
stable object) and sound existence (on or off). When
the staff drags the icon (i.e., sound object) by using the
sticky device on the table, its path is recorded as
traversed. He/she may repeat the tasks explained above
to define a complete set of the reminiscence material.
5.2 Design policy
We consider that there are two key points in the
development of a computer-assisted system
implementing reminiscence for practical use.
One is the friendliness and effectiveness of the
system to participants (older people). They are not
willing to use a computer and, thus, its user-interface
should be natural and simple.
The other concerns the utility of the system for an
experienced staff who guides older people in
reminiscence. There are demands of helping him/her in
creation of reminiscence materials and gathering of
data which are useful for analysis of the session, for
example, how long each of the participants spoke and
which topics he/she was interested in.
In this trial, we consider issues of a multimodal
interface for creation and play of reminiscence
icons associated with sound objects
Figure 8. Assignment of sound objects
Once the specification is completed, it is ready to
play the material. The staff can switch from one
picture (with sounds) to another by pressing a button
of the sticky device. Icons as sound markers are not
displayed anymore during the playback.
Meanwhile, a preliminary evaluation of the system
has been conducted. A group of three university
students participated in the test where they were asked
to have a reminiscence session using the system. We
compared system performances in two settings (with
and without sounds) by a questionnaire with three
questions: “Was the communication lively?”, “Was it
helpful to initiate a speech?”, and “Which setting is
All of the subjects marked higher score to the
setting with sounds than that with no sounds. In
addition, the following opinions are given by the
- The session with sounds stimulated reminiscence.
- Combination of background sounds with foreground
sounds, which are listened to consciously, would be
Though further detailed experiments must be
conducted, this system setup would be of help in
performing reminiscence therapy. Usefulness of the
authoring facilities for the experienced staff needs to
be investigated.
By the way, in the current implementation, we
assume static images. We will investigate an extension
so that videos can be used as a medium for
reminiscence therapy. The system should then provide
a facility that sound objects follow target objects in a
video. Of course an experienced staff doesn’t want to
learn complex operations in authoring. It is necessary
to design an interface so as not to make the authoring
of such dynamic content difficult. A mechanism of
video editing based on object movement that one of
the authors proposed before [34] would be helpful to
the development.
[1] M. M. Blattner, “Multimedia Interface Design”,
Addison-Wesley Pub., 1992.
[2] P. Yalla and B. N. Walker, “Advanced Auditory
Menus: Design and Evaluation of Auditory Scroll
Bars,” Proc., Int’l ACM SIGACCESS Conf. on
Computers and Accessibility, pp.105-112, 2008.
[3] S. Garzonis, C. Bevan, and E. O’Neill, “Mobile
Service Audio Notifications: Intuitive Semantics and
Noises,” Proc., ACM Australasian Conf. on ComputerHuman Interaction: Designing for Habitus and Habitat,
pp.156-163, 2008.
[4] E. Hoggan and S. Brewster, “Designing Audio and
Tactile Crossmodal Icons for Mobile Devices,” Proc.,
ACM Int’l Conf. on Multimodal Interfaces, pp.162169, 2007.
[5] I. Ekman, L. Ermi, J. Lahti, J. Nummela, P.
Lankoski, and F. Mäyrä, “Designing Sound for a
Pervasive Mobile Game,” Proc., ACM SIGCHI Int’l
Conf. on Advances in Computer Entertainment
Technology, pp.110-116, 2005.
[6] A. O. Effenberg, “Movement Sonification: Effects
on Perception and Action,” IEEE MultiMedia, Vol.12,
No.2, pp.53-59, Apr.-June 2005.
[7] J. J. Nixdorf and D. Gerhard, “RITZ: A RealTime
Interactive Tool for Spatialization”, Proc., ACM Int’l
Conf. on Multimedia, pp.687-690, 2006.
[8] T. Nakaie, T. Koyama, and M. Hirakawa,
“Development of a Collaborative Multimodal System
with a Shared Sound Display”, Proc., IEEE Conf. on
Ubi-Media Computing, pp.14-19, 2008.
[9] T. Nakaie, T. Koyama, and M. Hirakawa, “A
Table-based Lively Interface for Collaborative Music
Performance”, Proc. Int’l Conf. on Distributed
Multimedia Systems, pp.184-189, 2008.
[10] H. J. Song and K. Beilharz, “Aesthetic and
Auditory Enhancements for Multi-stream Information
Sonification”, Proc., Int’l Conf. on Digital Interactive
Media in Entertainment and Arts, pp.224-231, 2008
[11] S. Strachan, P. Eslambolchilar, R. Murray-Smith,
“gpsTunes Controlling Navigation via Audio
Feedback”, Proc., ACM MobileHCI’05, pp.275-278,
[12] M. Jones and S. Jones, “The Music is the
Message”, ACM interactions, Vol.13, No.4, pp.24-27,
July&Aug. 2006.
[13] J. Dodiya and V. N. Alexandrov, “Use of
Auditory Cues for Wayfinding Assistance in Virtual
Environment: Music Aids Route Decision,” Proc.,
ACM Symp. on Virtual Reality Software and
Technology, pp.171-174, 2008.
6. Conclusions
We investigated in the paper how actually sound
positioning serves us as an effective technique for
implementation of advanced computer applications. As
practical examples, two applications to music mashup
and reminiscence were presented, which have been
implemented on top of the tabular sound spatialization
system we developed before. Users can collaborate
with each other with the help of sound objects which
are spatialized on the table, in addition to graphical
Further studies still remain, which include
synchronization of sound objects running on a certain
path as to music mashup, and user tests by older
people in reminiscence.
This work has been supported in parts by the
Ministry of Education, Science, Sports and Culture,
Grant-in-Aid for Scientific Research, 20500481, 2008.
[14] J..Sodnik, S..Tomazic, C..Dicke, and M.
Billinghurst, “Spatial Auditory Interface for an
Embedded Communication Device in a Car,” Proc.,
IEEE Int’l Conf. on Advances in Computer-Human
Interaction, pp.69-76, 2008.
[15] D. I. Rigas and D. Memery, “Multimedia E-Mail
Data Browsing: The Synergistic Use of Various Forms
of Auditory Stimuli,” Proc., IEEE Int’l Conf. on
Communications, pp.582-588, 2003.
[16] H. Zhao, B. K. Smith, K. Norman, C. Plaisant,
and B. Shneiderman, “Interactive Sonification of
Choropleth Maps,” IEEE MultiMedia, Vol.12, No.2,
pp.26-35, Apr.-June 2005.
[17] K. Crommentuijn and F. Winberg, “Designing
Auditory Displays to Facilitate Object Localization in
Virtual Haptic 3D Environments,” Proc., Int’l ACM
SIGACCESS Conf. on Computers and Accessibility,
pp.255-256, 2006.
[18] M. S. Hancock, C. Shen, C. Forlines, and K.
Ryall, “Exploring Non-Speech Auditory Feedback at
an Interactive Multi-User Tabletop”, Proc., Graphics
Interface 2005, pp.41-50, 2005.
[19] D. Birchfield, K. Phillips, A. Kidane, and D.
Lorig, “Interactive Public Sound Art: A Case Study”,
Proc., Int’l Conf. on New Interfaces for Musical
Expression, pp.43-48, 2006.
[20] D. Lock and G. Schiemer, “Orbophone: A New
Interface for Radiating Sound and Image”, Proc., Int’l
Conf. on New Interfaces for Musical Expression,
pp.89-92, 2006.
[21] T. Ogi, T. Kayahara, M. Kato, H. Asayama, and
M. Hirose, “Immersive Sound Field Simulation in
Multi-screen Projection Displays”, Proc., Eurographics
Workshop on Virtual Environments, pp.135-142, 2003.
[22] C. Ramakrishnan, J. Goßmann, and L. Brümmer,
“The ZKM Klangdom”, Proc., Int’l Conf. on New
Interfaces for Musical Expression, pp.140-143, 2006.
[24] Y. Takegawa, M. Tsukamoto, T. Terada, S.
Nishio, “Mobile Clavier: New Music Keyboard for
Flexible Key Transpose”, Proc., Int’l Conf. on New
Interfaces for Musical Expression, pp.82-87, 2007.
[25] D. Overholt, “The Overtone Violin”, Proc. Int’l
Conf. on New Interfaces for Musical Expression,
pp.34-37, 2005.
[26] S. Jorda, G. Geiger, M. Alonso, and M.
Kaltenbrunner, “The reacTable: Exploring the Synergy
between Live Music Performance and Tabletop
Tangible Interfaces”, Proc., ACM Conf. on Expressive
Character of Interaction, pp.139-146, 2007.
[27] A. Crevoisier, C. Bornand, S. Matsumura, and C.
Arakawa, “Sound Rose: Creating Music and Images
with a Touch Table”, Proc., Int’l Conf. on New
Interfaces for Musical Expression, pp.212-215, 2006.
[28] B. Bruegg, C. Teschner, P. Lachenmaier, E. Fenzl,
D. Schmidt, and S. Bierbaum, “Pinocchio: Conducting
a Virtual Symphony Orchestra”, Proc., ACM Int’l
Conf. on Advances in Computer Entertainment
Technology, pp.294-295, 2007.
[29] N. Tokui, “Massh! - A Web-based Collective
Music Mashup System”, Proc., Int’l Conf. on Digital
Interactive Media in Entertainment and Arts, pp.526527, 2008.
[30] R. N. Butler, “Age, Death, and Life Review”,
Living With Grief: Loss in Later Life (Ed. by K. J.
Doka), Hospice Foundation of America, 2002.
[31] Y. C. Lin, Y. T. Dai, and S. L. Hwang, “The
Effect of Reminiscence on the Elderly Population: A
Systematic Review”, Public Health Nursing, Vol.20,
No.4, pp.297-306, Aug. 2003.
[32] N. Alm, R. Dye, G. Gowans, J. Campbell, A.
Astell, and M. Ellis, “A Communication Support
System for Older People with Dementia”, IEEE
Computer, Vol.40, No.5, pp.35-41, May 2007.
[33] N. Kuwahara, S. Abe, K. Yasuda, and K.
Kuwabara , “Networked Reminiscence Therapy for
Individuals with Dementia by Using Photo and Video
Sharing”, Proc., Int’l ACM Conf. on Computers and
Accessibility, pp.125-132, 2006.
[34] Y. Wang and M. Hirakawa, "Video Editing
Based on Object Movement and Camera Motion",
Proc., ACM Int’l Working Conf. on Advanced Visual
Interfaces, pp.108-111, 2006.
End-User Development in the Medical Domain
Maria Francesca Costabile*, Piero Mussio°, Antonio Piccinno*,
Carmelo Ardito*, Barbara Rita Barricelli°, Rosa Lanzilotti*
*Dipartimento di Informatica, Università di Bari, ITALY
°DICO, Università di Milano, ITALY
{costabile, piccinno, ardito, lanzilotti}, {mussio, barricelli}
their activities as competent practitioners, in that “they
exhibit a kind of knowing in practice, most of which is
tacit” and they “reveal a capacity for reflection on their
intuitive knowing in the midst of action and sometimes
use this capacity to cope with the unique, uncertain,
and conflicted situations of practice” [3]; b) they are
experts in a specific discipline (e.g. medicine, geology,
etc.), not necessarily experts in computer science. They
use their wisdom and knowledge in performing their
activities, need to collect and share the knowledge they
create to achieve their goals. Thus, they are knowledge
workers who need to become producers of content and
software tools.
The research we carried out in the last few years is
devoted to design and development of multimedia
interactive systems that support people in performing
activities in their specific domains, but also allow them
to tailor these environments so that they can better
adapt to their needs, and even to create or modify
software artefacts. The latter are defined activities of
End-User Development (EUD) [1, 2]. By end users we
mean people who use computer systems as part of
daily life or daily work, but are not interested in
computers per se [1, 4].
We show in this paper why End-User Development
(EUD) is particularly needed in the medical domain
and how the methodology we have defined to support
EUD can be successfully applied to this domain.
Nowadays, users are evolving from consumers of
content and tools to producers of them, also becoming
co-designers of their tools and content. In this paper
we report on a methodology that supports this
evolution. It derives from our experience in
participatory design projects to develop multimedia
systems to be used by professional people in their work
practice, supporting these people not only in
performing activities in their specific domain, but also
allowing them to tailor their virtual tools and
environments and even to create and modify software
artifacts. The latter are defined activities of End-User
Development (EUD). We show in this paper why EUD
is particularly needed in the medical domain and how
the methodology we have defined can be successfully
applied to this domain.
1. Introduction
A significant evolution of HCI practice is now
underway. Users are evolving from consumers of
content and tools to producers of them, increasingly
becoming co-designers of their tools and content [1, 2].
This evolution poses problems to software designers,
because users require software environments to create
their own tools empowered by the software but not
being obliged to become software experts. New
methodologies arise which support this evolution.
In this paper, we report on a methodology rising
from our experience in participatory design projects to
develop multimedia systems to support professional
people in their work practice. We illustrate our
approach by considering distributed multimedia
systems in the medical domain. Besides physicians, in
the last years we cooperated with other communities of
professional people, such as geologists and mechanical
engineers. These communities have some common
characteristics and requirements: a) they all perform
2. The overall approach
In the years, we have been developing an approach
to participative design and to the creation of software
infrastructures that support EUD activities as well as
knowledge creation and sharing performed by
knowledge workers in a specific domain.
The approach capitalizes on the model of the HCI
process and on the theory of visual sentences we have
developed [5]. HCI is modeled as a syndetic, holistic,
dynamic process: syndetic in that it is a process in
which two systems of different nature (the cognitive
human and the computational machine) cooperate in
the development of activities; holistic in that it is a
process whose behavior emerges from the behaviors of
the two systems, and cannot be foreseen in advance;
dynamic in that the HCI process occurs through the
cyclical exchange of messages (e.g. visual, audio or
haptic messages) between human and machine in a
temporal sequence. Each message exchanged between
the two communicants is subject to two interpretations:
one performed by the human and one performed by the
computer, based on the code created by the program
designer [1].
The research resulted in the definition of the
Software Shaping Workshop (SSW) methodology [1],
which adopts a participatory approach that allows a
team of experts, including at least software engineers,
HCI experts and end users to cooperate in the design
and implementation of interactive systems. The aim of
this methodology is to create systems that are easily
understood by their users because they “speak” users’
languages. Such systems are based on an infrastructure
constituted by software environments, called Software
Shaping Workshops (SSW or briefly workshops), and
communication channels among these workshops. The
term workshop comes from the analogy with an artisan
or engineer workshop, i.e. the workroom where a
person finds all and only those tools necessary to carry
out her/his activities. Following the analogy, SSWs are
virtual workshops in which users shape their software
tools. Each adopts a domain-oriented interaction
language tailored to its user’s culture, in that it is
defined by evolving the traditional user notations and
system of signs.
End users, as knowledge workers, interact with
SSWs to perform their activities, to create and share
knowledge in their specific domains, to participate in
the design of the whole system, even at use time.
Indeed, End-User Development (EUD) implies the
active participation of end users in the software
development process allowing users to create and/or
modify software artefacts. In this perspective, tasks
that are traditionally performed by professional
software developers are transferred to end users, who
need to be specifically supported in performing these
tasks. Some EUD-oriented techniques have already
been adopted by software for the mass market, such as
the adaptive menus in MS Word™ or some
“Programming by Example” techniques in MS
Excel™. However, we are still quite far from their
systematic adoption.
To permit EUD activities, we defined a meta-design
approach that distinguishes two phases: the first phase
consisting in designing the design environment (meta-
design phase), the second one consisting in designing
the actual applications by using the design
environment. The two phases are not clearly distinct
and are executed several times in an interleaved way,
because the design environments evolve both as a
consequence of the progressive insights the different
stakeholders gain into the design process and as a
consequence of the feedbacks provided by end users
working with the system in the field [1, 2].
The methodology offers to each expert (software
engineers, HCI experts, end users as domain experts) a
software environment (SSW), by which the expert
contributes to shape software artefacts. In this way the
various experts, each one through her/his SSW, can
access and modify the system of interest according to
her/his own culture, experience, needs, skills. They can
also exchange the results of these activities to converge
to a common design. The proposed approach fosters
the collaboration among communities of end users,
managers, and designers, with the aim of increasing
motivation and reducing cognitive and organizational
cost, thus providing a significant contribution to
EUD’s evolution.
The SSW infrastructure resulting from the
application of the SSW methodology is a network of
interactive environments (software workshops) which
communicate through the exchange of annotations and
boundary objects. In particular, the prototype of the
application being developed is used as a boundary
object, which can be used and annotated by each
stakeholder [6]. Each stakeholder participates to the
design, development and use of the infrastructure
reasoning and interacting with software workshops
through her/his own language. Therefore, the
workshops act as cultural mediators among the
different stakeholders by presenting the shared
knowledge according to the language of each
3. Multimedia systems in the medical
The evolution of information technology may
provide a valuable help in supporting physicians’ daily
tasks and, more importantly, in improving the quality
of their medical diagnosis.
In current medical practice, physicians have the aid
of different types of multimedia documents, such as
laboratory examinations, X-rays, MRI (Magnetic
Resonance Imaging), etc. Physicians with different
specializations usually analyze such multimedia
documents giving their own contribution to the
medical diagnosis according to their “expertise”.
However, this team of specialists cannot meet as
frequently as needed to analyze all clinical cases,
especially when they work in different hospitals or
even in different towns or states. This difficulty can be
overcome by providing physicians with computer
systems through which they can cooperate at a distance
in a synchronous and/or asynchronous way, also
managing multimedia documents. In [7], we provide
an example of such systems, that has been proposed to
support neurologists working at the neurology
department of the “Giovanni XXIII” Children Hospital
of Bari, Italy, which gives them the possibility of
organizing virtual meetings with neuro-radiologists
and other experts, who may contribute to the definition
of a proper diagnosis. The system is the result of an
accurate user study, primarily aimed at understanding
how the physicians collaborate in the analysis of
clinical cases, so that functional and user requirements
can be properly derived.
The study also revealed that physicians with
different specializations adopt different languages to
communicate among them and to annotate shared
documents. For example, neurologists and neuroradiologists represent two sub-communities of the
physician community: they share patient-related data
archives, some models for their interpretation, but they
perform different tasks, analyze different multimedia
documents (e.g., EEGs, in the case of neurologists,
MRIs, in the case of neuro-radiologists) and annotate
them with different notations, developed during years
of experience. Such notations can be considered two
(visual) languages.
The system described in [7] provides neurologists
and neuro-radiologists with software environments and
tools which are both usable and tailorable to their
needs. It has been designed by adopting the SSW
methodology [1]. Thus each specialist works with
her/his own workshop to analyze the medical cases of
different patients and to formulate her/his own
diagnosis, taking into account the opinions of the other
colleagues provided by the system, without the need of
a synchronous consultation.
More specifically, if the neurologist needs to
consult a neuro-radiologist, he makes a request by
opening an annotation window. This window permits
to articulate the annotation into two parts: the question
to be asked to the colleague; and the description which
summarizes information associated to the question. A
third part can be tailored according to the addressee of
the consultation request: if s/he is a physician who
needs more details about the clinical case, the sender
may activate the detailed description and fill it,
otherwise s/he can hide it. In other words, the
physician who wants to ask for a consultation is
allowed to compose a tailored annotation specific to
the physician s/he is consulting. In a similar way, a
physician can make a different type of annotation in
order to add a comment, which is stored and possibly
viewed by other colleagues, thus updating the
underlying knowledge base.
In the SSW approach, electronic annotation is a
primitive operator, on which the communication
among different experts is based. Moreover, the
annotation is also a tool through which end users
produce new content that enriches the underlying
knowledge base. An expert has the possibility of
performing annotations of various elements of the
workshops, such as a piece of text, a portion of an
image, a specific widget; through the annotation, the
expert makes explicit her/his insights regarding a
specific problem. The annotation is a peer-to-peer
communication tool when it is used by experts to
exchange annotated documents while performing a
common task (e.g., defining a medical diagnosis). An
expert can also annotate the workshop s/he is using,
since annotation is also a tool used to communicate
with the design team in charge of the maintenance of
the system. The annotations are indexed as soon as
they are created, by the use of a dictionary that is
defined, updated and enriched by the experts
themselves. The terms defined in the dictionary allow
the experts to use the language, in which they are
proficient, to annotate. They also permit the
communication and understanding among the different
actors having different expertise and languages.
4 EUD for managing Electronic Patient
The system described in the previous section allows
its end users to perform some EUD activities.
However, in the same medical domain, it is the
management of the Electronic Patient Record (EPR)
that pushes even more towards enabling EUD, as we
will show in the following.
4.1 EPR
The current implementation of the EPR causes a lot
of problems due to the fact that it is still commonplace
that individual hospitals and even specific units within
the same hospital, create their own standard
procedures, so that physicians, nurses and other
operators in the medical field are reluctant to accept a
common unified format. Actually, they need to
customize and adapt to their specific needs the patient
record [8]. Thus, the EPR is a natural target for EUD.
Patient record is many-sided because it is a
document to be read and understood by various and
very different actors, such as physicians, nurses,
patients’ relatives, the family doctor, etc., so that it
must have the ability to speak different “voices”, i.e.,
to convey different meanings according to the actors
using it [9].
The patient record contains at least two clear
intertwined voices: a voice reporting what health
professionals did to patients during their stay into the
hospital; and another voice attesting that clinicians
have honored claims for adequate medical care. Patient
records are official, inscribed artifacts that practitioners
write to preserve memory or knowledge of facts and
events occurred in the hospital ward [10].
The patient record has two main roles: a short-term
role refers to collect and memorize data to keep trace
of the care during the patient’s hospital stay; a longterm role refers to the archival of patient’s data for
research or statistical purposes [11]. Accordingly, the
specialized literature distinguishes between primary
and secondary purposes, respectively. Primary
purposes regard the demands for autonomy and
support of practitioners involved in the direct and daily
care of patients; while secondary purposes are the main
focus of hospital management, which pursue them for
the sake of rationalizing care provision and enabling
clinical research [9]. Our goal takes into account the
primary purpose of patient record by designing an
Electronic Patient Record (EPR) whose document
structures and functionalities are aimed at supporting
information inscription according to the specific needs
of each involved stakeholder.
In this scenario, document templates and masks are
usually imposed to practitioners, without considering
the specific needs and habits of those who are actually
using the EPR. The combination of requirements for
both standardization and customization means that
EPR systems are a natural target for EUD [9].
Again, in collaboration with the physicians of the
“Giovanni XXIII” Children Hospital of Bari, Italy, we
conducted a field study on the patient record and its
use through unobtrusive observations in the wards,
informal talks, individual interviews with key doctors
and nurses, and open group discussions with ward
practitioners. During the study, the analysts
periodically observed the physicians during their daily
work in the hospital (about 2-3 visits per month for
two months). They observed how the identified
stakeholders, i.e. head physicians, physicians, nurses,
administrative staff, etc., of the same hospital manage
paper-based patient records; our aim was to better
understand which kind of documents, tools and
languages are used. The information collected during
the study has been used to identify the right
requirements of an application implementing the EPR.
The most important point that emerged is that they
actually have specific patient records for each ward,
even in the same hospital; this because there is the
need of storing different data in the EPR, depending on
the specific ward. For example, in a children
neurological ward, information about newborn feeding
must also be available, while in an adult neurological
ward, information about alcohol and/or drug
Figure 1. A screen shot of the SSW for the head physician “unic” of the “Neurologia” (neurology) ward.
assumption is required.
The different patient records can be seen as being
composed by modules, each one containing specific
fields for collecting patient data. Various stakeholders
use the patient record in different ways and to
accomplish different tasks, i.e., the nurse records the
patient measurements, the reception staff records the
patient personal data, the physician examines the
record to formulate a diagnosis, and so on. We realized
that the patient records used in different wards
assemble a subset of modules in different ways,
customized to the need of the specific ward. Thus, our
approach was to identify the data modules that have to
be managed in the whole hospital and let each head
physician to design the EPR for her/his ward by
composing a document through direct manipulation of
such modules.
which he chooses those appropriate for his ward and
assembles them in the layout he prefers. Figure 1
shows the SSW for the neurology head physician
(“Primario Reparto: Neurologia” in Italian). The
working area of the SSW is divided in two parts: on
the left part there are all modules he can insert in the
ERP (“Moduli Inseribili” in Italian), e.g., “Misure
“Esami Fuori Sede”, etc. (“Entrance Anthropometric
Measurements”, “Feeding”, “External Examination” in
English respectively),; on the right part there are the
modules he is using to compose the tailored ERP
(“Cartella Clinica” in Italian), e.g., personal data,
“Routine Ematica” and “Consulenze Inviate”
(“Hematic Routine” and “Sent Counsels” in English
respectively). It does this by simple drag and drop of a
module selected on the left part and inserting it in the
desired position in the EPR he is composing in the
right part of the working area.
Once the EPR design is completed, the head
physician clicks on the “Save” button. In this simple
way, he has actually created a software artefact that
will be used by his ward personnel.
Figure 2 shows the EPR designed for neurology
ward as it appears in the SSW for nurses. A nurse uses
the EPR to primarily input data about patients. This
end user does not have all EUD possibilities allowed to
the head physician in his SSW, her/his tailoring is
limited to modify the layout of the EPR modules. This
because, if the nurse has to insert data in some specific
modules, s/he prefers to move these modules to the top
in order to find them quickly. Figure 2 shows a
4.2 Co-designing the EPR with end users
The design of a prototype system to manage EPR
followed the SSW approach, creating a software
environment (SSW) for each type of stakeholder to
allow them to accomplish their daily tasks in a
comfortable and suitable way, as well as to give them
the possibility of tailoring the SSW through EUD
In particular, an SSW has been developed for the
head physician, in which he can design the ERP
tailored to the needs of his ward. The system supports
his design activity by providing the SSW for the head
physician with the set of predefined modules, among
Figure 2. A screen shot of the SSW for the “Neurologia” (neurology) ward nurse “aner”.
situation in which the pointer is on the module
“Routine Ematica” (“Hematic Routine” in English)
because the nurse wants to move this module in a
different position.
5. Conclusions
This paper has discussed how to support end users
who are increasingly willing to become co-designers
of their tools and content. It is argued why End-User
Development is particularly needed in the medical
domain, were physicians, nurses, radiologists and other
actors in the field are the end users. Furthermore, it is
shown how the SSW methodology, which has been
defined to create interactive systems that support EUD,
can be successfully applied to this domain.
The infrastructure proposed by the SSW methodology
to create interactive systems as a network of software
environments (the SSWs) is implemented by
exploiting a suite of XML-based languages.
Specifically, the SSWs of the EPR prototype are
implemented as IM2L programs that are interpreted by
a specialized engine, which is a plugin of the web
browser [12, 13]. IM2L (Interaction Multimodal
Markup Language) is an XML-based language for the
definition of software environments at an abstract
level. In other words, environment elements and their
behaviours are defined in a way independent by
cultural and context-of-use characteristics; such
characteristics are specified through other XML-based
documents. The engine interprets these documents to
instantiate the EPR SSWs, which are rendered by an
SVG viewer under the coordination of the web
browser [14]. As future work, we have planned an
experiment with the end users. We will consider as
quantitative metrics both the execution time of the
assigned tasks and the errors made by the users. From
a qualitative point of view, we will administer a postexperimental survey based on the SUS (System
Usability Scale) method [15].
6. Acknowledgments
This work was supported by the Italian MIUR and
by EU and Regione Puglia under grant DIPIS and by
the 12-1-5244001-25009 FIRST grant of the
University of Milan.
7. References
1. M.F. Costabile, D. Fogli, P. Mussio and A. Piccinno,
“Visual Interactive Systems for End-User Development:
Transactions on System Man and Cybernetics Part ASystems and Humans, vol. 37, no. 6, 2007, pp. 10291046.
G. Fischer and E. Giaccardi, “Meta-Design: A
Framework for the Future of End User Development,”
End User Development, H. Lieberman, F. Paternò and V.
Wulf, eds., Springer, 2006, pp. 427-457.
D.A. Schön, The Reflective Practitioner: How
professionals think in action, Basic Books, 1983, p. 374.
A. Cypher, ed., Watch what I do: programming by
demonstration, MIT Press, 1993.
P. Bottoni, M.F. Costabile and P. Mussio, “Specification
and dialogue control of visual interaction through visual
rewriting systems,” ACM Transactions on Programming
Languages and Systems (TOPLAS), vol. 21, no. 6, 1999,
pp. 1077-1136.
M. Costabile, P. Mussio, L. Parasiliti Provenza and A.
Piccinno, “Supporting End Users to Be Co-designers of
Their Tools,” End-User Development, V. Pipek, M. B.
Rosson, B. de Ruyter and V. Wulf, eds., Springer, 2009,
pp. 70-85.
M.F. Costabile, D. Fogli, R. Lanzilotti, P. Mussio and A.
Piccinno, “Supporting Work Practice Through End-User
Development Environments,” Journal of Organizational
and End User Computing, vol. 18, no. 4, 2006, pp. 4365.
C. Morrison and A. Blackwell, “Observing End-User
Customisation of Electronic Patient Records,” End-User
Development, V. Pipek, M. B. Rosson, B. de Ruyter and
V. Wulf, eds., Springer, 2009, pp. 275-284.
F. Cabitza and C. Simone, “LWOAD: A Specification
Language to Enable the End-User Develoment of
Coordinative Functionalities,” End-User Development,
V. Pipek, M. B. Rosson, B. de Ruyter and V. Wulf, eds.,
Springer, 2009, pp. 146-165.
M. Berg, “Accumulating and Coordinating: Occasions
for Information Technologies in Medical Work,”
Computer Supported Cooperative Work (CSCW), vol. 8,
no. 4, 1999, pp. 373-401.
G. Fitzpatrick, “Integrated care and the working record,”
Health Informatics Journal, vol. 10, no. 4, 2004, pp. 291302.
B.R. Barricelli, A. Marcante, P. Mussio, L. Parasiliti
Provenza, M. Padula and P.L. Scala, “Designing
Pervasive and Multimodal Interactive Systems: An
Approach Built on the Field,” Handbook of Research on
Multimodal Human Computer Interaction and Pervasive
Services: Evolutionary Techniques for Improving
Accessibility, P. Grifoni, ed., Idea Group Inc., to appear.
D. Fogli, G. Fresta, A. Marcante and P. Mussio, “IM2L:
A User Interface Description Language Supporting
Electronic Annotation,” Proc. Workshop on Developing
User Interface with XML: Advances on User Interface
Description Languages, AVI 2004, 2004, pp. 135-142.
W3C, “Scalable Vector Graphics (SVG),” 2009;
J. Brooke, P.W. Jordan, B. Weerdmeester, A. Thomas
and I.L. McLelland, “SUS: A quick and dirty usability
scale,” Usability evaluation in industry, Taylor and
Francis, 1996.
Transformation from Web PSM to Code
Yen-Chieh Huang1,2, Chih-Ping Chu1, Zhu-An Lin1, Michael Matuschek3
Department of Computer Science and Information Engineering,
National Cheng-Kung University, Tainan, Taiwan
Department of Information Management, Meiho Institute of Technology, Pingtung, Taiwan
Department of Computer Science, University of Duesseldorf, Germany
E-mail :[email protected]
2、Literature Review
This research proposes how class diagrams
that use the Unified Modeling Language (UML) can
be converted to a user interface of a Web page using
the Model Driven Architecture (MDA). From the
Platform Independent Model (PIM) we go to the Web
Platform Specific Model (PSM), and then to the
direct generation of code templates for Web page
applications. In this research the class diagrams are
drawn with the Rational Rose, then, using our
self-developed program, these diagrams can be
transformed into code templates with Servlets, JSP,
and JAVA. We implement a case study for verification,
and then calculate the transformation rate with lines
of code (LOC) coverage rate by measuring the LOC
after transforming and after the system is finished.
The results show the transformation rate is about
thirty-six to fifty percent, which represents that this
research can help the programmers to greatly reduce
the developing period.
The object-oriented paradigm has gained
popularity in various guises not only in programming
languages, but also in user interfaces, operating
systems, databases, and other areas [2]. Classification,
object identity, inheritance, encapsulation, and
polymorphism and overload are the most prominent
concepts of object-oriented systems [3]. The UML is
a modeling language that helps describing and
designing software systems, particularly software
systems built using the object-oriented approach.
This research uses Robustness diagrams [4] for
describing the application environment of Web pages.
The MDA is a framework for software
development defined by the Object Management
Group (OMG). It is the importance of models in the
software development process [5, 6]. The MDA
development life cycle included four kinds of models.
Computation Independent Models (CIM) describe
the requirements for the system and represent the
highest-level business model. It is sometimes called
“domain model” or “business model”. A PIM
describes a system without any knowledge of the
final implementation platform, and this PIM is
transformed into one or more PSMs. A PSM is
tailored to specify a system in terms of the
implementation constructs that are available in one
specific implementation technology. The final step in
the development is the transformation of each PSM to
code. The CIM, PIM, PSM, and code are shown as
artifacts of different steps in the software
development life cycle, which is shown in Figure 1.
Keywords: Model Driven Architecture, Platform
Independent Model, Platform Specific Model
Software is largely intangible [1]. Software
development gradually transforms from structure
analysis and design to object-oriented analysis and
design, but the software industry is labor intensive,
even after finishing system analysis, the programmers
still start from scratch and write the code. Especially
in the application software development for Web
pages, in the last few years, there are many
researches have been proposed to reduce code and
development time. This research focuses on how
class diagrams can be transformed into Web pages,
the results could reduce the development time for
Web pages programmers. The common Web pages
developing tools include JSP, PHP, and ASP etc.. The
platform used in this research is JAVA, the Web pages
developing tool is JSP, relevant technology are JSP,
Servlets and Ajax. This research uses IBM Rational
Rose as the CASE tool for class diagram object
modeling, and the user interface code templates are
then created via the conversion program written by
Figure 1. MDA software development life cycle and
output artifacts
The most widely used architecture in the
environment of Web applications is Browser/Server
(B/S) approach, an example for a specific
Client/Server (C/S) structure [7]. The basic
architecture of Web systems includes a client browser,
a Web server, and a connecting network. The
principal protocol for communication is the
Hypertext Transfer Protocol (HTTP). The principal
language for expressing the context between the
client and the server is Hypertext Markup
Language (HTML) [8].
Relevant technologies for today’s Web
applications include CGI, Applets, ActiveX controls,
plug-ins and Ajax etc. To explain the general
structure of such a Client/Server system, a Web page
can be modeled into a class, and a client page can be
modeled into another class, which must be drawn by
the method of extending UML [9].
3、Transformation from Class Diagrams to Web
In the concept of MDA we must first create the
PSM design for a specific Web application. A Web
page can be expressed by class diagrams where every
stereotype (including stereotype classes and
associations) is defined in order to describe the
situation of every Web page, then the Web class
diagrams can be drawn and, in the final step, it can be
transformed into a code template.
3.1 Web Pages Components Mapping Methods
3.1.1 Stereotypes
In order to extend its function of use in UML,
we can use stereotypes to strengthen and define the
class model. Stereotypes allow us to get a more
proper description to the class objects, they can be
used for describing and limiting the characteristics of
the module components, and they exist in standard
UML components [10]. In this paper, we use Rational
Rose to define control classes and strengthen the
classes that describe the Web pages. This research
proposes stereotype class mapping methods as
described in Table 1.
Responsible for showing the request of
client site, and communicating with
back end module The methods of this
class contain at least Get() or Post().
A server page represents the server site
information, the attributes and methods
in this class are implemented by
Scripting Element.
A client page represents the <HTML>
element, which has two principal child
elements: <HEAD> and <BODY>.
The <HEAD> represents structural
information about the Webpage; the
<BODY> element represents the
majority of the displayed content [8].
The HTML <<Form>> stereotype class
represents some attributes, such as
input boxes, text areas, radio buttons,
check boxes, and hidden fields, these
classes map directly to a <Form>
element [8].
A <<model>> stereotype class
represents the logical operation of
business processes, which is
implemented by JAVA. Its meaning is
the same as traditional class diagrams,
therefore a class diagram notation can
ignore the <<Model>> stereotype in
this research.
Table 2. Association Stereotypes
This is an action of a Servlet or a
Server Page creates a Client Page or
a Form.
A relationship between a client page
<<Link>> [8]
and a server-side resource or Web
A directional association from a Web
page to another Web page.
The client page should be
automatically replaced with another
client page, where Post and Get are
two methods to achieve this, among
This represents many types of
embedded objects, such as Applet,
<<Object>> [8]
ActiveX controls.
The parameters for the object are
defined in the parameterized class.
The client page sends an
asynchronous request to Servlet.
A relationship between a <<Form>>
and a server page. Post or Get are
used for submitting, among other
3.1.2 Association Stereotypes
In order to implement Web modules, it is vital
to control user-site and server-site requests and
responses via HTML in the network. Using
association stereotypes between classes is an optional
way to model HTTP parameters, and it is useful when
parameters are relatively complex or have special
semantics and extra documentation is necessary.
Therefore, this research proposes the mapping
methods of association stereotypes between classes as
shown in Table 2.
3.2 PSM to Code Template Transformation
Every stereotype class has different
transformation model, in here; we describe a Servlet
transformation rule as an example. The attributes and
Table 1. Stereotypes Mapping in Class
methods in Servlet are implemented by traditional
JAVA, but the difference lies in the association
between classes. Generally speaking, a Servlet must
accept a Form request, and then a redirection to
another Webpage occurs. Its transformation steps are
as follow:
1. <<Form>> request- According to Form request the
association names (Get or Post), then declare the
method of doGet or doPost.
2. <<Client Page>> asynchronous- In Servlet,
implement the asynchronous pattern and then
declare the method doAsynWork.
3. << Redirect >>- Generate the code as follow:
request.getRequestDispatcher("/****Redirect Page
view.forward(request, response);
5、 Case Study
5.1 Experiment Steps
The CASE Tool selected for this experiment is
the Rational Rose from IBM which transforms class
diagrams into code templates. First, Rational Rose is
used to draw the class diagrams, then the labels of the
stereotypes are added in the class diagrams, and lastly
we utilize the program developed by ourselves to
transform the class diagrams into code templates.
5.2 Case Description
To verify the theoretical structure proposed by
this research we use the practical example of a
Login/Register System. It has three main functions in
the Use Case Diagram. There are “Account
registration”, “User login”, and “Display Home
Figure 3 is a class diagram of PIM of a user
Login/Register System which reflects the Use Case
diagrams. In the preliminary design, which uses
Robustness diagrams for description, we include the
entity classes, boundary classes and control classes.
Boundary classes represent the shown Web page
content, i.e. the information in the system, such as the
account and password fields that LoginClient offers
for the user login. Control classes deal with the
parameter request by the boundary classes, such as
login request to LoginServlet of LoginClient, and
they are determined to call out Register of the Entity
class to deal with the request.
For the experimental evaluation we adopt
“code coverage” to calculate the result. Code
coverage is a measure used in software testing. It
describes the degree to which the source code of a
program has been tested. In this research, code
coverage represents the ratio of information in class
diagrams to the information in the full implemented
system. Talking about information, we define the way
of measurement and standard of quantification
analysis as follows:
4.1 The Way of Measurement
In a software development project, software
measurement can be achieved in a lot of ways, such
as lines of code (LOC), function point (FP), object
point, COCOMO, and Function requirement etc. We
choose LOC, and the reasons are:
1. The value is easily measured.
2. There is a direct relationship to the measurement
of person-months (effort).
3. Effort is also a size-oriented software metric [11].
For a class diagram, it expresses static information as
well as the relation between classes, and the resulting
LOC can be easily counted automatically after
Figure 3. The PIM of a Login/Register System
Use Case 1: Account Registration
This use case includes the boundary classes
RegisterClient, RegisterForm, and RegisterBackForm,
the control class RegisterServlet, and the entity class
Register as back end. Between the classes
RegisterClient and RegisterServlet, there are
asynchronous relations, so the Ajax pattern will be
used for realizing the code transformation.When the
user succeeds to register, the class RegisterServlet
will redirect him to the Index home page.
Use Case 2: User Login
This use case includes the boundary classes
LoginClient, LoginForm, and LoginToRegister and
the control class LoginServlet. When the user inputs
his account and password, the class LoginForm will
send a request to the class LoginServlet using the
Post method, and then the class LoginServlet makes
the decision if the user is redirected to the Index or
4.2 Counting Standard
LOC counters can be designed to count
physical lines, logical lines, or source lines by using a
coding standard and a physical LOC counter. For
different kinds of Coding Style, the LOC turns out
differently, so we need to define the Coding Standard
and Counting Standard which we use for our
In this research, line counters are defined as follows:
1. XML has defined and self-defined tags in Web
pages, a set of tag counts as one line.
2. If the web pages are not XML, (e.g. Scripts,
Scriptlets, and Expressions), every line of code
counts as one line.
the class LoginClient.
Use Case 3: Display the home page
The home page includes Index and
AVLTreeApplet, and is displayed by a Java Applet. It
is described how the Applet object is loaded and
integrated into the Index home page via object
parameter classes.
the representing models, they can only express static
class content and relationships. There are also other
aspects that cannot be described in design and
transformation for more complicated program logic.
For this reason, we can make use of sequence
diagrams, and state diagrams, in order to describe the
dynamic call and transfer between the states. So, the
further research will study how to create Web code
templates from interaction diagrams and behavior
5.3 Measurement Result
We measured the LOC of the code template for
each use case after transformation and the LOC of the
finished system by the previously defined counting
standard. The data is shown in Table 3.
Table 3. Measurement Result
LOC of Code
template after
Use Case 1: Account Registration
42 14
Use Case 2: User Login
Use Case 3: Display home page
LOC of
Practical Software Development using UML
and JAVA Second Edition, Mcgraw-Hill, 2005.
Nierstrasz, O., A Survey of Object-Oriented
Concepts, In Object-Oriented Concepts,
Databases and Applications, W. Kim and F.
Lochovsky, , ACM Press and Addison-Wesley, 1989,
pp. 3-21.
Gottlob, G., Schrefl, M. and Rock, B.,
Extending Object-Oriented Systems with Roles,
ACM Transactions on Information Systems, Vol.
14, No. 3, Jul. 1996, pp. 268-296.
[4] Ambler, S.W., The Object Primer: Agile
Model-Driven Development with UML 2.0,
Cambridge Univ Pro, 2004.
[5] Kleppe, A., Warmer, J. and Bast, W., MDA
Explained: The Model Driven Architecture™:
Practice and Promise, Addison Wesley, Apr.
[6] Koch,
transformation techniques used in UML-based
Web engineering, IET Software, Vol. 1, Issue 3,
Jun. 2007, pp. 98-111.
[7] Li, J., Chen, J. and Chen, P., Modeling Web
Application Architecture with UML, IEEE
CHF, 30 Oct. 2000, pp. 265-274.
[8] Conallen, J., Building Web Applications with
UML Second Edition, Addison Wesley, 2002.
[9] Conallen, J., Modeling Web Application
Architectures with UML, Communications of
the ACM, Vol. 42, No. 10, Oct. 1999.
[10] Djemaa, R.B., Amous, I., and Hamadou, A.B.,
WA-UML: Towards a UML extension for
modeling Adaptive Web Applications, Eighth
IEEE International Symposium on Web Site
Evolution, 2006, pp. 111-117.
[11] Humphrey, W.S., PSP A Self-Improvement
Process for Software Engineers, Addison
Wesley, Mar. 2005.
The results show that the transformation rate is
about thirty-six to fifty percent. When we focus on
the part not responsible for the program logic in this
class, this is a relatively high proportion. The
transformation into the code template according to
the defined Web page class diagrams represents the
static structure model of the system, consisting of
attributes, operations, and associations between
classes. However, the system operation logic cannot
be expressed in detail. This part is still up to the
6、 Conclusion
Nowadays, Web code must be programmed
from scratch even if the PSM analysis is finished, but
in this research we proposed a method of code
template transformation. By adding stereotypes to
class diagrams, they can describe Web pages,
synchronous or asynchronous relations, and we can
transform them into code templates with distinct
logical, control, and view code blocks using
JSP&Servlets or the MVC model.
Asynchronous relations can be realized using
many methods. This research adopts Foundations of
Ajax to express that the client site responds to the
server site. Furthermore, reverse engineering is a
factor to be considered, so that maybe change of the
code can be reflected in the Web class diagrams
For the case study example in this research, a
class diagram transformed into code templates can
only be achieved about thirty-six to fifty percent of
the whole system, which expresses that it does not
discover all the sufficient information we want.
Because of the definition of class diagrams and
Experiences with visual programming in
Engineering Applications
Valentin Plenk
University of Applied Sciences Hof
Alfons Goppel Platz 1, 95028 Hof, Germany
[email protected]
In children’s playrooms and in secondary school projects
programmable toys and entry level programming courses
use visual programming languages instead of the (standard)
textual source codes seen in Logo, BASIC, Java. Higher education and research also propose visual programming or
even (graphical) model based design to steepen the learning
curve1 .
Industry however appears unfazed with this approach.
Textual source code is still the main means of representing
Based on experience gained in laboratory exercises conducted with students of an undergraduate course in mechatronics this paper adresses the feasibility and efficiency of
this approach.
1. Introduction
A wide range of research papers proposes graphical representations for complex software ranging from domain
specific code generators (e.g:[6], [9]) to software models
expressed in UML (e.g:[4], [18]).
[3] succinctly summarizes the reasoning for the visual
representation :
The human visual system and human visual
information processing are clearly optimized for
multi-dimensional data. Graphical programming
uses information in a format that is closer to the
user’s mental representations of problems, and allows data to be manipulated in a format closer to
the way objects are manipulated in the real world.
Another motivation for using graphics is that
it tends to be a higher-level description of the de1 In this context a “steep” learning curve means quick progress in learning – the increase in knowledge over time is growing steeply (at least during the initial stages of learning).
sired action (often de-emphasizing issues of syntax and providing a higher level of abstraction)
and may therefore make the programming task
easier even for professional programmers.
This research effort is flanked by a wide range of commercially available, domain specific visual programming and
execution environments. Some examples could be National
Instrument’s Labview, Agilent’s Vee, IEC 61131-3 Sequential Function Charts, Mathworks’s Simulink. The following
links on Wikipedia give a quick synopsis of these products:
[12, 13, 14, 15, 16, 17]. More information can be found at
the respective products’ websites.
In daily practice the unified modeling language [7] has
become a (graphical) standard in the early phases of the
software development process – i.e: in design documents.
The diagrams are used to describe the architecture of a software product on a more or less abstract level. Recently efforts to execute the models have become visible.
However the vast majority of actual software products
is still implemented in textual source code. Common sense
apparently considers the available tools as unprofessional or
unsuited for big projects. There are little to none publications investigating the validity of this opinion.
To contribute some facts this paper summarizes experiences with visual programming made in an undergraduate
course at the University of Applied Sciences Hof. A group
of engineering students specializing in mechatronics was
tasked with a signal-processing exercise.
2. Mechatronics
The students in the bachelor course “industrial engineering” specializing in mechatronics have to master a series of
laboratory exercises designed to deepen the understanding
of signal processing theory and its application in mechatronical systems. In one of these exercises the students are
tasked with defining and implementing a criterion for stopping the motor of a car’s power-window when something or
someone is clamped in the window.
a second type of connecting lines representing the control
flow (thick lines).
Figure 2. Simple program to study the behaviour of the motor current
Figure 1. Experimental setup for the mechatronical assignment
2.1. The laboratory exercise
The laboratory setup comprises a car-door with a powerwindow, power electronics to drive the mechanics, measurement circuitry to pick up the motor current and a controlling
PC running Agilent Vee (figure 1).
“Agilent VEE is an easy-to-use intuitive graphical test &
measurement software that provides a quick and easy path
to measurement and analysis.” [1] The software offers visual dataflow programming. The students “only” need to
connect a few blocks – signal source, signal processing, signal sink – to implement their first application, a simple dataacquisition and display program necessary to analyze the
motor current data to find criteria for detecting that something is clamped in the window. Figure 2 shows a solution
for this first step.
This first application’s main function is the dataflow
from the analogue-digital-converter (data source block at
the right) to the scope display (sink block at the far right),
represented by the thin line linking source and sink. The
remaining blocks to the left of figure 2 are necessary to
setup the ADC and to implement an infinite loop to read and
display the data. This program sequence is specified with
The students are then instructed to conduct a series of experiments with and without objects clamped in the window
to find a criterion for detecting that something is clamped in
the window. This step profits from the intuitive way of combining signal-sources, signal-processing-blocks and sinks
(displays...) offered by the dataflow-oriented software.
Once a criterion is established, its implementation
which is usually a combination of calls to existing signalprocessing-blocks is quickly found – again thanks to the
dataflow design.
Making the criterion stop the motor however is not so
easy. The intuitive program used so far is straightforwardly
extended to periodically read the ADC and the windowbuttons and to write the motor up/down bits. The criterion
still works, since it is connected to the dataflow from the
ADC. As soon as the criterion stops the motor, the data from
the ADC will no longer indicate a clamped object (the motor current will be zero), thus the criterion will allow the
motor to run (again). This will lead to data that make the
criterion stop the motor, which makes the criterion start the
motor. Eventually the window will perform a jerking motion.
To overcome this problem the students have to add states
to the software. They can do it in a dataflow compatible
way, by adding a feedback variable that is written at the end
of the dataflow and read at the beginning of the dataflow.
The students usually reject this approach as unintuitive. The
alternative approach is a more complex controlflow consisting of two nested loops: The outer loop is the dataacquistion loop used so far. The inner loop is entered once
the criterion was activated and blocks the application until
the window-buttons are released.
Figure 3 shows a rather well structured solution for the
exercise. The software is still functional but no longer easy
to understand. The original loop is marked with “data acquisition”. The additional code is necessary to implement
the conversions transparently and therefore allows the user
to concentrate on the application.
But there is no rose without a thorn – the initially extremely steep learning curve in this exercise becomes considerably flatter as soon as the students need to add the control flow mechanisms to prevent the jerky movement. At
this point they ask the tutor to find an implementation for
their solution.
the criterion and the “unjerky” stop function.
when buttons pressed,
start Motor and
enter inner loop
Data Acquisition
Stop Motor;
leave inner loop,
when buttons released
Figure 3. Example for a (good) solution
2.2. The learning curve
Section 2.1 gives an impression of the exercise’s complexity and size. The exercise is run in a 4 hour session
with one experienced tutor for four to five groups. Each
group with four to five students works in parallel on different exercises. The students have no prior knowledge of
Agilent Vee. The instructions for the exercise contain some
hints regarding the criterion and the usage of Agilent Vee.
The initial program shown in figure 2 is a part of these instructions.
Good students require about 1 hour of tutoring with respect to the criterion and Agilent Vee and are then able to
implement a working software like the version shown in figure 3.
This astounding performance is attributed to the visual
programming interface offered by Agilent Vee. Other exercises of similar complexity implemented in C++ or BASIC
show three to four times lower productivity even though
they are run with undergraduate students in computer science that have extensive prior knowledge of the programming language and the environment.
The key element seems to be Agilent Vee’s intuitive way
to connect (existing) software blocks by dataflow lines. In
C++ or BASIC such a link would be implemented as a
function/method-call and probably require some data conversion from one call to the next call. Agilent Vee handles
2.3. Evolvability
Figure 3 shows that the initially attractive, intuitive visual programming quickly becomes a poorly structured
confusing diagram of linked blocks. Without the manually
inserted boxes, the code is almost unintelligible even though
this example is fairly simple. More complex applications
will have an even more confusing structure and therefore
need even more documentation.
The necessity to use structuring elements – e.g: hierarchical blocks – along with the increased need for documentation significantly reduces the productivity of the visual approach.
In “one-shot” projects where no revisions are necessary
this lack of evolvability is not a problem. [10] describes
a field of industry using almost only this kind of software
projects. In this context the main challenge is the reuse of
code modules in a large software framework. With the right
kind of “building blocks” in the visual programming environment the reuse of powerful code modules is facilitated
3. Conclusion
The results clearly show that with visual programming
the learning curve is indeed steep compared to textual
source codes. The students produced impressive results
rather quickly, especially as long as big code-blocks are
reused by coupling them together to calculate the criterion.
To stop the motor properly control flow elements have to be
used as well. This rapidly results in a complex diagram, that
might be hard to evolve in future versions.
In the authors’ opinion visual programming is a powerful approach that allows to quickly build highly functional
applications that efficiently reuse code-blocks. On the other
hand these applications are not evolvable and should be considered as “one-shot” customization requiring a complete
(quick) rewrite for the next version. [2, 5, 6, 8, 9, 18] exemplify the demand for this kind of software projects / tools in
[16] Wikipedia.
Sequential function chart.
function_chart, February 2009.
[17] Wikipedia. Simulink.
wiki/Simulink, February 2009.
[18] D. Witsch, A. Wannagat, and B. Vogel-Heuser. Entwurf
wiederverwendbarer Steuerungssoftware mit Objektorientierung und UML. Automatisierungstechnische Praxis (atp),
50(5):54 – 60, 2008.
[1] Agilent. Agilent VEE Pro 9.0. http://www.home.
806312.00&id=1476554&cmpid=20604, February
[2] M. C. Andrade, C. E. Moron, and J. H. Saito. Reconfigurable system with virtuoso real-time kernel and tev environment. Symposium on Computer Architecture and High
Performance Computing, pages 177–184, 2006.
[3] C. Andronic, D. Ionescu, and D. Goodenough. Automatic
code-generation in a visual-programming environment. Proceedings of the Canadian Conference on Electrical and
Computer Engineering, pages 6.30.1–6.30.4, September
[4] J. Bartholdt, R. Oberhauser, and A. Rytina. An approach to
addressing entity model variability within software product
lines. 3rd International Conference on Software Engineering Advances, pages 465–471, 2008.
[5] G. Bayrak, F. Abrishamchian, and B. Vogel-Heuser.
Effiziente Steuerungsprogrammierung durch automatische
Modelltransformation von Matlab/Simulink/Stateflow nach
Automatisierungstechnische Praxis (atp),
50(12):49 – 55, December 2008.
[6] J. C. Galicia and F. R. M. Garcia. Automatic generation of
concurrent distributed systems based on object-oriented approach. Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems,
16:809 – 814, 2004.
[7] O. M. Group. Uml standard.
spec/UML/2.1.2/, February 2009.
[8] R. Härtel and T. Gedenk. Design and implementation of
canopen devices. Proceedings of the iCC2003, pages 04–1
– 04–6, 2003.
[9] P. Lauer. Auto generated production code for hydraulic systems. Proceedings of the 6th International Fluid Power Conference, Dresden, 1:171 – 182, 2008.
[10] V. Plenk. A benchmark for embedded software processes
used by special-purpose machine manufacturers. In Proceedings of the third international Conference on Software
Engineering Advances, pages 166 – 171, Los Alamonitos,
2008. CPS.
[11] P. Stöhr, S. Dörnhöfer, R. Eichinger, E. Frank, D. Krögel,
M. Markl, M. Nowack, T. Sauerwein, J. Steffenhagen, and
C. Troger. Untersuchung zum Einsatz von Robotern in der
Informatikausbildung. internal report University of Applied
Sciences Hof, July 2008.
[12] Wikipedia. Agilent Vee. http://en.wikipedia.
org/wiki/Agilent_VEE, February 2009.
[13] Wikipedia. Grafcet.
wiki/Grafcet, February 2009.
[14] Wikipedia. Labview.
wiki/Labview, February 2009.
[15] Wikipedia.
Mindstorms NXT.
http://en., February
Advantages and Limits of Diagramming
Jaroslav Král
Michal Žemlička
Charles University in Prague, Faculty of Mathematics and Physics
Department of Software Engineering
Malostranské nám. 25, 118 00 Praha 1, Czech Republic
The importance of software diagrams is often overemphasized as well as underrated. We show that diagrams
must be used especially in the cases when (weakly structured) texts related to the natural language must be also
used. Examples are requirements specification, software architecture overviews, formulation of ideas or basic principles of solutions. The fact that texts must be used together
with diagrams is often considered as a disadvantage as both
”formats” must be synchronized what is not an easy task.
We show that a proper combination of advantages and disadvantages of texts and diagrams can bring a great benefit.
The advantages of diagrams in late stages of SW development are not clear. One solution is to use diagrams
in initial stages of development only. An example of such
strategy is agile development. The second way is to suppress the importance of code like in model-driven architecture (MDA). MDA works well for small and routine projects.
The application of diagrams in large projects leads to complex systems of complex diagrams. Such diagrams are not
too useful. It may be the main reason of limited success of
1. Introduction
properties of best modeling practices are not clear enough.
The semantics of the diagrams is vague. Under certain circumstances it need not be wrong. The diagrams do not support newly invented constructs – an example is service government (compare the history of exceptions in flow charts).
There are doubts whether diagrams are of any use in
software maintenance as the updates of code and updates
of diagrams are usually not well synchronized and the diagrams therefore tend to be obsolete. Some methodologies
like extreme programming [2] forbid any use of diagrams
for maintenance or require, like in Agile Programming Manifesto [3] that the diagram should be used as an auxiliary
mean only. An intended exception is Model Driven Architecture (MDA, [7]) when code has an auxiliary role and it is
generated from diagrams. It has some drawbacks discussed
On the other hand the use of diagrams in early phases
of development is quite common. But it can, as noticed,
lead to the situation when a software system has two defining/describing documentations – code and supporting diagrams being often obsolete.
2. Engineering Properties of Diagrams
The graphical nature of diagrams implies the following
1. The diagrams consisting of many entities are unclear
as humans are unable to follow more than ten entities
at once. Diagrams therefore are not too advantageous
to model complex systems. It is confirmed by observation. The solution can use decomposition of the
systems into autonomous components (e.g. services
in SOA) and hierarchical decomposition using subdiagrams. The subdiagrams depict subsystems. The problem is that it is often difficult to do it well technically
and conceptually.
Diagrams and the practices using them are considered to
be very helpful, easy to understand and use. Experience indicates that the use of diagrams is not without issues. The
notations of diagrams have been evolving quite rapidly, may
be quicker than software development paradigms. Some
aspects of software system structure and use have several
different modeling frameworks. There are several diagram
types used to model the same entities, e.g. business processes or workflows. Workflow can be described by activity
diagrams in UML, diagrams in Aris [4], there are also two
system for workflow model languages designed by W3C
and WfMC. It indicates that the modeling needs and the
2. It is often difficult to implement a ”good” modification of diagrams, i.e. transformations retaining desira-
ble properties of transformed diagrams like lucidity. It
implies that the use of diagrams during software maintenance need not be helpful.
3. The semantics of diagrams tends to be vague in order to support intuitiveness and flexibility. It is good
for specification as in this case the semantics can be
gradually ”tuned”. It, however, partly disqualifies the
diagrams as a code definition tool.
All the facts are straightforward. Their managerial consequences for process control are often not properly taken into
account. The consequence is that diagrams tend not to be
useful for the maintenance of long living systems.
3. Diagrams in Early Stages of Software Development
Diagrams are used in early stages of software development. They are often used in requirement specification documents (RSD). RSD can be and MDA must be highly formalized but it then needs not be any good solution as the semantics of such formalized specification language is rather
IT knowledge domain oriented than user knowledge domain
oriented. It can disturb the focus on the user visions and
user needs as the semantics of RSD can far from the semantics of the user-domain languages; It therefore almost avoid
the possibility of effective collaboration with users during
the formulation of the specification1 .
A satisfactory solution is to use a specification language
close to the professional user knowledge domain language
[5] and to use user-domain diagrams. The diagrams like the
specification languages should be flexible enough to enable
iterative specification techniques and stepwise increase the
precision and depth of requirements.
Such diagrams are then well understood by users so they
can well collaborate with developers. In this case the user
knowledge domain diagrams can be and usually should be
used. Such diagrams are used as long as the specification
documents are used and updated.
If a larger system is to be developed, its overall architecture must be specified together with the requirements specification as the architecture determines the structure of the
requirements specification document. It is particularly typical for systems having service-oriented architecture (SOA).
The diagrams depicting some aspects of SOA are very useful. Other aspects are difficult to depict yet.
It is often preferable to depict other overall (global) properties of the solution. The proper use of diagrams can substantially speed up the specification process and enhance the
quality of the resulting specification.
1 It
is one of the reasons why MDA has only a limited success.
As the specification is a crucial document, sometimes
even a part of formal agreements, it is kept actual and the
above problems with obsoleting diagrams need not take
place. Diagrams can help to explain global properties of
A crucial fact is that the diagrams are associated with
text in a ”natural” language – requirements in the form legible for customers, informal descriptions of system architecture or of some aspects of the solutions depicted by the
3.1. Why Diagrams?
Diagrams can be something like a ”materialization” of
ideas. They, like any natural language, can be as vague or
incomplete as necessary at a given moment or according
”state of art” of a project. They can hide details but they
can be iteratively precised to achieve needed exactness and
completeness. It is simplified by the fact that they can be
well integrated into text documents.
Many diagram types are intuitive and are the part of
professional languages. They should increase transparency
what is possible if they are not too complex, otherwise they
can be worse than a structured text.
Some diagramming techniques provide an excellent tool
for thinking and enable an easy detection of thinking gaps.
The applications or the use of diagrams in specification
documents increased the legibility and ”visibility” of the
requirements and supports the collaboration of developers
and users. It is very important as the snags in specifications
causes 80 % of development failures.
Diagrams are intuitively easier to understand by both developers and users. Almost no tiresome preliminary training of users e.g. the reading of manuals and syntax training
is necessary. Diagrams are part of many user knowledge
domains. And as such they can be used in specification documents.
Some global properties like the system architecture are
well depicted by proper diagrams. Incomplete diagrams can
be useful. Iterative development of diagrams supports an
iterative thinking as a multistep approximation process.
The missing or incomplete parts of diagrams are very
often well visible and it is clear how to insert the missing
parts. It is especially true if a connector notation is used.
The diagrams are especially good during the initial steps
of the solution of issues. Diagrams provide a powerful outline of a system provided it is not too large.
It is worth of mention that in all these cases the diagrams
are used like figures or blueprints in technical and scientific
publications and documents. They are in fact the part of
the (text) document. The role of the diagrams is so important that the document is used during any update of the text
the diagrams are updated too. The problem of obsoleting
diagrams can be then avoided.
4. Diagrams in the Later Stages of Software
Life Cycle
only are used and no open source code exists or it cannot be
The sample researches indicate that MDA are rarely
used, compare [1] containing results of the research in
Czech Republic. The reasons for such conclusions are:
1. The use of diagrams as a programming tool leads to
decisions to use the diagrams as a specification mean.
It leads to the antipattern ”premature programming”
as the requirements are transformed into diagrams that
need not be well suited to user knowledge domains and
languages. The requirements are then usually not well
formulated and yet worse transformed or adapted to fit
MDA domain, not the user knowledge domain.
Diagrams can be used in the later software life cycle design through maintenance. Typical aims can be:
1. The enhancement (better quality) of user interfaces,
i.e. the enhancement of system usability (compare [6]).
2. The better understanding of the system requirements
by system designers, coders, testers, and, sometimes,
maintainers. As diagrams are difficult to modify properly, they are not too useful for maintenance. It is true
especially for complex diagrams and tasks.
2. The underlying automated code generation system
(ACG) must ultimatively be without errors. It is usually hopeless for developers to repair the failures of
ACG or to change the generated code for other reasons, for example, effectiveness reasons. The errors in
ACG superpose the errors in code compilers.
3. Implementation of a tool to support decisions during
design, coding, and sometimes testing.
4. Code generators. It is typical for model-driven architecture.
3. The (collection of the) diagrams necessary to model a
given system is very complex. It is then quite difficult
to navigate across the ”database” of diagrams. It can be
e.g. quite uneasy to look for some names or patterns.
5. Auxiliary tools for design and coding and for code
The use of diagrams in the ways described in 1 and 2 is
rather a necessity than an option. Following applications of
diagrams can have substantial positive effects:
4. Some phenomena need not be easily described via
MDA diagrams (e.g. some aspects of SOA service orchestration).
• The use of diagrams as an auxiliary tool is reasonable
and effective provided that the diagrams are discarded
5. Small changes of the generated code can require large
and laborious changes in the structure of diagrams.
6. The use of MDA requires painful changes in software
development practices. On the other hand, it fixes current state of art for e.g. object-oriented attitude for a
too long time.
• The practices mentioned in 3 are classical but not too
satisfactory. Code is the most important document in
classical practices. Changes are typically done in the
code first and then, hopefully, in related diagrams. The
changes in text are often easier than the changes in the
diagrams. There is therefore no strong need to update
the diagrams. The result is that the diagrams become
obsolete and it ”does not matter” for some time. The
effort needed to update the diagrams is then felt superfluous. The final state is that only code does matter –
see the principles of agile development.
In large systems the diagrams are so complex that they
lost the advantages discussed above. The use of complex
diagrams can then become contraproductive.
5. The Case of Model Driven Architecture
The main issue with software development oriented diagrams is that they often are an auxiliary tool only. One
solution is that diagrams with some additional information
7. Current MDA are on the other hand too object oriented. It can be disadvantageous if one wants to integrate batch systems or to design user-friendly interfaces, etc.
8. There is almost no guarantee that the life-time of the
MDA supporting system will be long enough to enable
a reliable support covering entire lifetime.
We can conclude that MDA is a promising concept but it is
now well usable for smaller non-critical systems only.
6. Texts as well as Diagrams
Some stages of software development must use texts and
diagrams. Examples are specification and architecture description. It is reasonable to attempt to find some advantages from it.
The most important advantage is the possibility to apply
the general principles of the writing of well-formed documents. Such documents are in their text part reasonably
structured and their ”graphical” part does not use cumbersome figures.
Plain text can be flexibly structured using standard methods like paragraphs, chapters, abbreviations, links, indexes, and so on. Changes can be made very easily and can
be easily logged. It can be objected that text is not clear
and illustrative enough. Note, however, that the clearness
of the text need not deteriorate if the document size grows.
This property is not observed for diagrams, the clearness of
which falls with size.
Proper combination of texts and diagrams enables the
more flexible structure of documents. It is a good practice
for technical documents.
Texts can be now, using e.g. XML, structured in a very
sophisticated way enabling e.g. very powerful document
presentation in digital form. There is a lot of powerful tools
for the text generation, looking for editing, etc. It is not too
difficult to guard whether changes in the text were propagated into diagrams (pictures) and vice versa.
Such an attitude can substantially weaken the drawback
of diagrams that there are no satisfactory tools enabling
searching for diagrams having similar semantics.
All it works well for the diagrams not defining directly
the program structure and used during system maintenance
(like the diagrams used in MDA). In this case the only feasible solution seems to be full equivalence of diagrams and
code. In other words the development of tools enabling the
generation of the code from diagrams and vice versa the
generation of the diagrams form the code provided that the
code fulfills some standards. Such tools are not fully available, but some solutions exist. The transformation diagrams
→ code is available in MDA system. The transformation
code → diagrams is known as reverse engineering. Available solutions are, however, not powerful enough. MDA
diagrams tend to be too much programming oriented (see
Experiments with tools like ACASE [8] have shown that
it is possible to make tools allowing to display code as text
or as a diagram. Users let display the code as diagrams
when working with simple algorithms. Complex algorithms
were typically displayed as text as it is possible to see a larger part of the algorithm at once. Sometimes the combination has been used: The critical control structure has been
shown as diagram, the rest as text.
It seems that it clearly demonstrates the usability of such
tools: The beginners may start with more intuitive diagrams, complex things may be displayed as text, and finally,
when it is necessary to analyze the code, the combination of
code and diagram can give the highest benefit.
7. Conclusions
The use of diagrams and other graphical means during
the development must be used as a tool supporting specification or a mean supporting the initial stages of a problem
solving process. The problem of applications is the description of overall architecture and similar properties of systems.
In these cases the diagrams should be combined with text
in a structured natural language. If made properly, the combination of texts and diagrams can bring great benefits as
the advantages of both forms can be combined and their
disadvantages eliminated. The existing CASE systems do
not support enough such a solution.
The use of diagrams as in specification should be viewed
as a use of diagrams as natural language ”enhancement”.
The application of diagrams to describe and maintain the
structure of the system in the small, e.g. to define programming constructs so that they can be maintained has not been
for large systems solved properly yet. It is not clear, whether
the use of such detailed diagrams for such a purpose is even
a reasonable goal.
This research was partially supported by the Program
”Information Society” under project 1ET100300517 and
by the Czech Science Foundation by the grant number
[1] L. Bartoň. Properties of MDA and the ways of combination
of MDA with other requirements specification techniques (in
Czech). Master’s thesis, Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic, 2006.
[2] K. Beck.
Extreme Programming Explained: Embrace
Change. Addison Wesley, Boston, 1999.
[3] K. Beck, M. Beedle, A. van Bennekum, A. Cockburn,
W. Cunningham, M. Fowler, J. Grenning, J. Highsmith,
A. Hunt, R. Jeffries, J. Kern, B. Marick, R. C. Martin, S. Mellor, K. Schwaber, J. Sutherland, and D. Thomas. Agile programming manifesto, 2001.
[4] IDS Scheer.
Aris process platform.
[5] J. Král. Informačnı́ Systémy, (Information Systems, in Czech).
Science, Veletiny, Czech Republic, 1998.
[6] J. Nielsen. Usability Engineering. Academic Press, New
York, 1993.
[7] OMG.
[8] M. Žemlička, V. Brůha, M. Brunclı́k, P. Crha, J. Cuřı́n,
S. Dědic, L. Marek, R. Ondruška, and D. Průša. Projekt Algoritmy (in Czech: The Algorithms project), 1998. Software
PSS: A Phonetic Search System
for Short Text Documents
Jerry Jiaer Zhang Son T. Vuong
University of British Columbia, Canada
2366 Main Mall, Vancouver, B.C., Canada
relationships between words and the containing documents to
create a dictionary for phonetic searches on single- and
multiple-, correctly spelled and misspelled words and phrases.
The remainder of the paper is organized as follows: Section 2
provides the system design. Section 3 takes a look on
evaluation in terms of phonetic matching accuracy and
efficiency. Section 4 concludes this paper and gives an outlook
to future work.
It is the aim of this paper to propose the design of a search system
with phonetic matching for short text documents. It looks for
documents in a document set based on not only the spellings but also
their pronunciations. This is useful when a query contains spelling
mistakes or a correctly spelled one does not return enough results. In
such cases, phonetic matching can fix or tune up the original query by
replacing some or all query words with the new ones that are
phonetically similar, and hopefully achieve more hits. The system
allows for single- and multiple-word queries to be matched to
sound-like words or phrases contained in a document set and sort the
results in terms of their relevance to the original queries. Our design
differs from many existing systems in that, instead of relying heavily on
a set of extensive prior user query logs, our system makes search
decisions mostly based on a relatively small dictionary consisting of
organized metadata. Therefore, given a set of new documents, the
system can be deployed with them to provide the ability of phonetic
search without having to accumulate enough historical user queries.
2. System Design
This section presents two parts. The first is the creation of the
dictionary data structure PPS relies on. The second is the
phonetic matching mechanism based on the dictionary.
2.1. Dictionary Creation and Maintenance
We organize the data in a way that allows fast access, easy
creation and maintenance. The data structure storing the
documents serves as a dictionary for non-linear lookups. It also
contains meta-data that describes document properties for
multi-word sound-based searching.
Index Term – Phonetic Match, Search
2.1.1. Text Processing
Given a document, it is not difficult to break it into words. In
PPS, we identify words with regular expressions to match
predefined regex patterns to words. A set of unique words
containing letters and digits are extracted from this process.
1. Introduction
With the ever increasing amount of data available on the
Internet, quickly finding the right information is not always
easy. Search engines are being continuously improved to better
serve this goal. One useful and very popular feature is phonetic
matching. Google’s “Did you mean” detects spelling errors if
not many matches are found and suggest the corrections that
sound like the original keywords. Yahoo and MSN use the
different names “We have included” and “Were you looking
for”, but they essentially do the very similar thing. This feature
has become so popular that almost all the big search engines
cannot run without it. However as much as it is highly
demanded, not many websites can afford to provide this kind of
user experience mostly due to the technical limitation: an
extensive set of historical queries to build a statistic model for
word retrieval and correction. PPS is to address this gap. It is a
search system replying on a relatively small, self-contained
dictionary with phonetic matching ability that is similar to what
the big websites can offer.
In this paper, we propose the design of PPS, which only
requires a small data set to function. It focuses on the
correlations among different words and phrases, as well as the
2.1.2. Dictionary Creation
The processed documents can then be used to create the
dictionary that carries not only the original document text but
also additional information that describes their properties. The
following sections discuss the creation of these properties that
are stored together with the original documents they are derived
from as metadata.
Word List
A Word List is merely a list of distinct words that appear in
the document retrieved during the text processing phase. It is
sorted alphabetically. Sorting could be somewhat expensive but
there are 2 reasons of doing it. First, documents tend to be static
once they are stored in the database, so sorting usually only
needs to be performed once for each document. Second, the
overhead of dictionary creation does not add to the searching
run time, so it is preferable to organize the data in a way that
facilitates search performance. We can use binary search on the
number of results is lower than the predefined configurable
Result Size threshold, the system starts phonetic matching.
Then results are ranked based on their relevance to the query
and only those that exceed a predefined Sound-Like threshold
are returned.
sorted list for word matching to achieve O(lgn) time
tf-idf Weight
tf-idf Weight is a statistical measure to evaluate the
importance of a word to a document in a set of documents [1]
[7]. It is obtained by multiplying Term frequency and Inverse
Document Frequency. A high tf-idf weight is archived by a
high term frequency in the given document and a low document
term frequency in the whole set of documents. Therefore, terms
appearing commonly in all documents or infrequently in a
considered document tend to be given low weights and thus can
be filtered out [8].
2.2.1. Word Matching
We use the Boolean Model to find matching documents
Search is purely based on whether or not the query word exists
in the document word lists. Boolean Model is quite efficient at
this since it only needs to know whether or not the qualified
documents contain the queried terms.
2.2.2. Result Sorting
The retrieved documents texts are represented as vectors in
an algebraic model where each non-zero dimension
corresponds to a distinct word in that document [6][7]. Building
vectors for the respective documents can calculate the
document similarities by comparing the angles between them
[2]. If we compare the angles between a query and the retrieved
documents, we can tell how “close” each document is to the
query. A common approach to calculate vector angles is to take
the union of the terms in two documents as the dimensions,
each of which contains the frequency of the word in that
document. PPS has improved it for better accuracy.
First, instead of using term frequency as values for vector
dimensions, we applied the tf-idf weights to evaluate the
importance of word to the considered document [7] because
longer documents might have a low proportional term
frequency even thought that term may have a higher occurrence
than it does in a much shorter document. In such cases, it is
imprudent to simply take the longer one. We apply tf-idf
weights since the local tf parameter normalizes word
frequencies in terms of the length of the document the words
reside in. The global parameter idf contributes to the result the
frequency of the documents containing the searching word
relative to the whole document set. The product of the two
parameters, the tf-idf weight thus represents the similarity of
two documents with respect to the local term frequency ratio
and the overall document frequency ratio [9]. In other words,
rare terms have more weight than common terms. In our system
a document is represented as a weight vector:
v = [tf − idf , tf − idf , tf − idf , … , tf − idf , ]
where i is the total number of distinct words in two documents.
Incorporating the above change, the sorting process works
the following way:
1. Construct two initial document vectors of the same
dimensions from the query and a document
2. Take the tf-idf weight values of the query and the
document from the dictionary and fill them into the
corresponding vector dimensions
3. Calculate the angel between the two vectors
4. Repeat step 1 to 3 for each document in the result set
returned by Boolean text matching
5. Sort the result set by the cosine values of the angles. A
larger number indicates higher relevance of the
corresponding document
Double Metaphone Code
Double Metaphone indexes words by their pronunciations
and generates two keys, primary and alternate, that represent
the sound of the words [5]. To compare two words for a
phonetic match, one takes the primary and alternate keys of the
first word, compare them with those of the second word. The
two words are considered phonetically matching only if their
primary and/or alternate keys are equivalent [5].
Local Phrase Frequency
The local phrase frequency keeps track of the frequency of
phrases in a document. To the context of this paper, a phrase is
two or more consecutive words in the same order as they are in
the contained document. We count phrase frequencies by
grouping every two consecutive words and calculates the
frequency and then grouping every three consecutive words
and calculates the frequency. This process goes on till it groups
all words of the document and calculates the frequency. Phrases
derived from the above list are searched through the whole
document to count their occurrences. To prevent bias towards
longer documents, the occurrences are divided by the
document’s word length. The quotients thus serve as the phrase
frequencies. Each phrase, together with its frequency, is then
saved in a local phrase frequency table for each document. We
call it local because this value is independent of the content of
other documents in the document set.
Global Phrase Frequency
After the local phrase frequencies of a document are
calculated, they are added to the global phrase frequency table.
If the phrase exists in the table, its frequency is increased by the
local phrase frequency. PPS uses it to determine how often a set
of words occur together, as well as how frequently such a
combination appears across documents.
2.1.3. Dictionary Maintenance
When new documents are added to the document set, the
dictionary is updated to adjust the relative term match strength
of each document that is derived from these documents. The
major work is to re-calculate the tf-idf weight via a database
script. It periodically processes the new documents since the
last run and re-adjusts properties related to the whole document
set at the end.
2.2. Single Word Search
Searching for a single word involves finding all matching
documents and sorting them in the order of relevance. If the
2.2.3. Phonetic Matching
If the above step does not return the documents the user
relatively small data pool, we are able to implement a
reasonably comprehensive scoring system to rank the
candidates in order to find the best match.
Rank Candidate Corrections: Now that a list of candidate
words have been found. The next step is to choose the best
match(es). The system has a ranking system that takes into
account the following factors:
Weighted Levenshtein Distance from a candidate to the
original misspelled query word. The reason to compare it with
the complete word rather than its first k characters is to ensure
the evaluation reflects the relevance of a candidate to the query
word as a whole. The concept has been commonly used in
bioinformatics, known as the Needleman-Wunsch algorithm,
for nucleotide sequence alignment [4]. It makes sense in our
application domain because among all spelling mistakes, some
are more likely to occur than the others. Table 2.1 is a list of
considered operations and their costs in calculating the
Weighted Levenshtein Distance.
looks for, PPS starts phonetic matching. The system first
performs a match operation assuming the spelling is correct. If
still not enough results are returned, then it performs another
search operation with spelling correction.
Low Hits Resulted from a Correctly Spelled Query
PPS first tries to broaden the result by looking up sound-like
words in the document set. Because words of same or similar
pronunciations are encoded into the same or similar Double
Metaphone code, a simple database query comparing the index
Double Metaphone codes of two words will return a set of
words that sound like the queried one. These words will be
sorted by their pronunciations close to the original word by the
Levenshtein Distance of their Double Metaphone codes.
Because Double Metaphone codes are strings, we can apply the
Levenshtein Distance to measure their differences and thus
calculate the similarities of their sounds. Words that are
phonetically identical always have the same Double Metaphone
code, so their Levenshtein Distance is 0. As the pronunciations
of two words become less and less alike, their Double
Metaphone codes will have more different characters from each
other and thus result in a further Levenshtein Distance. The
system ranks the Levenshtein Distances between the query and
the candidate words, and sorts them based on the different
Levenshtein Distances.
Double Letter Error
Table 2.1: Operations for Weighted Levenshtein Distance
calculation and their costs.
Low Hits Resulted from an Incorrectly Spelled Query
If a query is misspelled, PPS first finds correctly spelled
candidate words that are close to the query word, and then it
ranks the candidates and returns the most matched one(s). The
next two sections discuss each of the above steps in details.
Find Candidate Corrections: We observed that in most
cases a misspelled word had a Levenshtein Distance of no more
than 3 from the correct word. We also noticed that errors tend to
occur towards the end of words. Because we are only interested
in those that are close to the query word, the above two
observations suggested that we could focus only on the 1-, 2-,
and 3-Levenshtein Distances of the beginning portion of each
word. The following is how it works:
1. Given a query word of length n, set k = ⌈0.6n⌉, where k is
the size of leading characters to be taken from the query.
2. If k ≤ 3, k = min(3, n); else if k > 7, k = 7. The lower
bound of k guarantees there are enough permutations to
form Levenshtein Distance of 3. The upper bound of k is 7.
This reflects our observation that the beginning portion of
a query word is more likely to be correct, so the following
correction process will use this portion as the base for
matching. The lower and upper bound of k were based on
our experiments. They seemed to be the golden numbers
that balanced accuracy and efficiency.
3. Take the first k characters of the query word and generate
a key set where each item is a key whose Levenshtein
Distance is 1, 2, or 3 from the k-length string.
4. Check each key in the key set for a given query against the
word list metadata of each document in the document set.
Return the words that also start with those same keys.
From our experiment, the size of candidate corrections only
ranges from a couple of words to at most several hundred in a
considerably big document set due to the large number of
phonetically incorrect keys in the key set. Because of the
The cost associated with each operation was from our
experiment. This combination seemed to produce better results
than others. The Weighted Levenshtein Distance is the
normalized total cost of performing these operations. If c is the
total operation cost to transform a candidate to the query word,
and n is the query word length, the score from the Weight
Levenshtein Distance can be calculated as:
where c is always less than or equal to n because the maximum
cost is no greater than 1.
Next, Starting and Ending Characters of a candidate word
are checked against those of the query word. The more
beginning or ending characters the two words share in common,
the more likely the candidate is the correction of the misspelled
query. It was also from our tests the closer a letter is towards the
middle of a word, the more likely a spelling mistake can happen.
We took into account this factor in the ranking system with a
linear scoring function which works the following way:
1. Set s = 0. From the first letter of the candidate and the
query word, check if they are identical. If they are,
increment s by 1 and move on to the next letter (in this
case, second one) of both. Repeat this process until:
a.the two letters at the same position from the two words
are not the same,
b. or the letter position is equal to half of the length of the
shorter word.
2. Set e = 0. Starting from the last letter of the candidate and
the query word, do the same as Step 1 except that it checks
for the second half of the words.
3. The final score for this factor is calculated as:
whitespace in between words to make it more search-friendly.
Furthermore, phrase entries whose word lengths are less than
that of the query string are neglected because it is impossible
for them to hold the query string.
min(n , n )
where n is the length of the query word, and n is the length of
the candidate word. The division is necessary to normalize the
score to prevent bias toward longer words.
Third, the Double Metaphone code of both the candidate
word and the query word are compared to calculate the third
score based on their pronunciations:
1. if the primary key of the candidate is the same as the
primary key of the query word, the candidate gets 0.3
2. else if the primary key of the candidate is the same as the
alternate key of the query word, or if the alternate key of
the candidate is the same as the primary key of the query
word, the candidate gets 0.2
3. else if the alternate key of the candidate is the same as the
alternate key of the query word, the candidate gets 0.1
4. else if none of the above three conditions is met, the
candidate gets 0
The maximum score a candidate can possibly get from this
factor is 0.3, which is lower than the other two factors. We
made this decision based on two reasons. First, due to the
complexity and the “maddeningly irrational spelling practices”
[5] of English, the Double Metaphone algorithm may fail to
generate unique codes to distinguish certain words. The second
and more important reason is that, even if there were a perfect
phonetic mapping algorithm that could distinguish every single
different pronunciation, it is still not able to consider words that
sound the same but differ in meanings. These words are known
as homophone. Because it is unlikely that the users would
misspell a word as one of its homophones, we had to be careful
not to overly rely on phonetic similarity. This is why the
Double Metaphone score is weighted only about 1/3 of the
previous spelling-oriented factors.
Search on Phonetically Similar Word: Now that the best
phonetically matched word is found, the system performs a
single word search using the new word as the query word to
find the result documents. This time, Phonetic Matching is
performed because the new query is from the document set,
which means a none-empty return set is guaranteed.
2.3.2. Result Sorting
Sorting is based on the importance of the query string to both
the document and the whole document set. Therefore, this is
where both the local and the global phrase frequency tables are
needed. Sorting for multi-word queries is actually much easier
than for single-word queries because the Vector Space Model
with tf-idf weights used for the single-word ones does not apply
here due to the fact that counted frequencies of phrases are not
all meaningful. For example, given the sentence “How are you”,
“how are” is a valid phrase in PPS but it does not function as a
meaningful unit in the sentence. On the other hand, from our
tests, the simple phrase frequency comparison worked well.
Each document gets a score which is the product of the local
and global phrase frequencies of the query string. The higher
the score is, the more relevant that document is to the query
string. This method produces reasonably good results because it
takes into account the importance of a phrase both locally to the
document and globally to the whole document set.
2.3.3. Phonetic Matching
Similar to single-word search, if the strict text-based
matching does not return satisfying results for the phrase, the
system starts the sound-based search following these steps:
1. Break a query phrase into a list of single words.
2. For each word, perform the single-word phonetic
matching operation to retrieve a list of top candidates.
3. Consider all possible permutations of the candidate lists
by taking one word from each of them. For each
permutation, refer to the global phrase frequency table to
get its global frequency in the whole document set. This is
called correlation check.
4. After all permutations are generated and their global
phrase frequencies are check, return the one with the
highest frequency.
As the query size increases, the permutations from all
candidate word lists grow exponentially. Fortunately, we
observed that a permutation could be generated by selecting
one word from each candidate list and then concatenating the
selections together. It means before a permutation is formed, all
entries in the global phrase frequency table are possible
matches. Then, the first word from the first candidate list is
chosen as the first element of the permutation. At this point,
those phrase frequency entries not containing the same first
word can be purged. Next, the second word from the second
candidate list is chosen as the second element of the
permutation. Among the phrase frequency entries left from the
previous selection, those without the same second word can
also be purged because there will be no match to the whole
permutation for sure. The process goes on till either the
permutation is completed or there is no phrase frequency entry
left. If the permutation is completed, it means there is a match
in the phrase frequency table. Otherwise, there is no such a
phrase that can match the incomplete permutation from its first
element up to its last that is generated right before the process
stops. Therefore, all permutations with the same beginning
2.3 Multiple Word Search
Similar to single word search, there are two stages involved
in multiple word search:
1. The system performs text matching search. If the queried
phrase is found in more than the Result Size number of
documents, the system sorts and returns all of them.
2. If the queries word is not found or only exists in the
number of documents smaller than the Result Size
threshold, the system performs phonetic matching search,
sorts and returns the results.
There are also 3 steps in these stages like in single word
search: Phrase Matching, Resulting Sorting, and Phonetic
Matching but their implementations are somewhat different.
2.3.1. Phrase Matching
The Boolean Model is applied to check the phrase against the
local phrase frequency tables. Since each table consists of
entries of two or more words separated by a whitespace, the
query also needs to be re-formatted to have exactly one
elements as the incomplete one can also be purged. Moreover,
we can further optimize the process by reducing the phrase
frequency pool to only those entries with the same number of
words as that of a complete permutation. This limits the initial
data set size to make it converge more quickly. Figure 1 is an
example of the optimized permutation generation process on a
three-word query.
Suppose there are five three-word phrases in the phrase
frequency table. They are “A C A”, “B A B”, “B C A”, “C C A”,
and “C C B”. Regardless of the size of the original phrase
frequency table, these five are always the ones to beginning
with because any other phrases with more or less words are
purged. For simplicity, each candidate list has four words, “A”,
“B”, “C” and “D”, to be chosen from, and there are three such
candidate lists. Therefore, without optimization, a total of 64 (4
∙ 4 ∙ 4) permutations are needed for the five existing phrases.
3. Evaluation
This section discusses the performance of PPS. A simulator
and test data were created to search for restaurant names
throughout the Greater Vancouver Region. We examine the
effect of single- and multiple-word searches with phonetic
matching. By comparing the results to the actual data in the test
document set, we evaluate the search accuracy and running
time of the system with different types of inputs.
3.1 Simulation Setup
3.1.1 Test Data Pool
The test data is a restaurant names in the Great Vancouver
Region. We have built a crawler with the free software Web
Scraper Lite [3] to grab and extract restaurant listings from into MySQL. The data pool consists of more
than 3800 restaurant names. We chose restaurant names as our
test data because of two reasons. First, PPS was designed
specifically for short text documents. The lengths of restaurant
names usually varied from one to eight words, and thus would
make good test data for the evaluation. Secondly, a lot of the
names were non-English so phonetic matching would be useful.
3.1.2 Test Input
We created the test input in two stages. First, a set of
correctly spelled words and phrases were generated. These
words and phrases must not appear in the test data pool. Second,
we created a set of misspelled words and phrases with a
Levenshtein Distance greater than zero but less than or equal to
five from the existing words and phrases in the test data pool.
There are 1000 inputs in total for the test. Table 3.1 is a
summary of the types and the sizes of the input we tested on.
Figure 2.1: Optimized permutation generation process on a
three-word query
Let’s see how optimization can speed up this process. We begin
by picking “A”, the first word in the first list, as the first
element of the permutation. Because there is only one phrase of
the five starting with “A”, the other four do not need to be
checked for the rest of this permutation. Then another “A”, the
first word in the second list, is picked as the second element of
the permutation. Now this permutation starts with the words “A
A”. Because the only phrase left from the last selection does not
start with “A A”, we can stop this permutation and any other
permutations starting with “A A”. In Figure 1, “A A” is
surrounded by a dotted board to represent the termination of
this “branch”. Next, we pick the second word “B” from the
second list. Similarly, there are no phrases start with “A B”, so
any permutations of “A B X”, where “X” can be either “A”,
“B”, “C” or “D”, are ignored. Thus, “A B” is also surrounded
by a dotted board. Next, we pick “C” from the second list to
form “A C”. Because “A C A” matches to “A C” for now, we
can move on to the third list and select “A” from it to form the
first complete permutation “A C A”. At this point, we find a
match and no further permutations of “A C X” will be
performed because we know there is only one phrase in the
form of “A C X”. Instead of 16 permutations and comparisons,
only one is generated and 5 comparisons are made between the
incomplete permutation and the phrase entries for all
permutations starting with “A”. Repeat the same steps for the
rest till all phrases are found. One extreme case is when a
permutation starts with “D”. All the 16 “D X X” combinations
are ignored. A save of 16 generations and 15 comparisons! In
figure one, only a total of 6 complete permutations are
generated and 5 of them are the matches. From out tests, such
optimization could save over 90% of time on average.
Input Type
Correct Word
Correct Phrase
Misspelled Word
Misspelled Word
Table 3.1: Test input types and sizes
3.1.3 Simulator
We implemented a simulator in PHP to query the test data
pool with the test inputs and to collect the test results. What it
essentially does is the two searching stages for single- and
multiple-word queries described in the previous chapter.
3.2 Simulation Results
The primary goal of the simulation is to evaluate the
accuracy of phonetic search when dealing with different types
of input: correct word, correct phrase, misspelled word, and
misspelled phrase. We will discuss each of them in this section.
Input Type
Correct Word
Correct Phrase
Misspelled Word
Misspelled Phrase
# of Queries
# of Matches
Table 3.3: Number of phonetic matches from the correct word,
phrase queries, and from misspelled word, phrase queries
The first two rows of Table 3.3 are the search results when a
query was a correctly spelled word or phrase. The system
yielded a 95.6% successful rate when dealing with single-word
queries. It was because the search process took a regressing
pattern to gradually increase the Levenshtein Distance between
the Double Metaphone code of the query word and that of a
document until it found the first match. For those the system did
not find a match, it was because they were generated so
randomly that their Double Metaphone Levenshtein Distance
from any document was equal to the length of the Double
Metaphone code itself. In other words, these words did not
sound like any words in the test data. Searching for correctly
spelled phrases yielded a lower 86.4% successful rate. This is
because the system needs to find candidate words that are
phonetically close to every word in a query phrase. If any word
returns an empty candidate list, the matching stops.
Furthermore, the more words a query phrase has, the less likely
there is a match in the document set. This observation was
proven by the fact that, among the 13.6% unsuccessful query
phrases, most of them consisted of five or more words.
The last two rows of Table 3.3 are the search results when a
query word or phrase was misspelled. Single word queries
yielded a high 89.2% successful rate. When we were generating
the test input, we intentionally made all queries a Levenshtein
Distance no more than 5 to model the common error patterns.
This is why the phonetic matching worked well with spelling
mistakes. It came a little surprise that the unsuccessful words
were the smaller ones. We think it was because after
normalization, even close Levenshtein Distance could be
proportionally large to small words. For misspelled phrases, the
successful rate is close to its correctly spelled counterpart. This
was expected because the decisions were made on the same
factors - the Local and Global Frequencies.
Text-based Word
Text-based Phrase
Sound-based Correct
Word Search
Sound-based Correct
Phrase Search
Sound-based Misspelled Word Search
Sound-based Misspelled Phrase Search
Input Type
4. Conclusion and Future Work
In this paper, we introduce PPS, a search system based on
both text and sound matching for short text documents. The
system makes incorporates several widely adapted algorithms
into its staged searching process to deal with different search
cases. Each stage has its own scoring model built upon some
common algorithms and the metadata specifically prepared for
it. The various metadata associated with documents are the
keys to the dictionary-based approach our system takes for
phonetic searching. We provide a high level design specifying
the system implementation from dictionary creation and
maintenance to text- and sound-based matching for various
types of queries. We also evaluate the system performance
under these circumstances. The results suggest that our system
meets its design goal with respect to accuracy and efficiency.
There are several areas in the development of the system that
deserve further exploration. First of all, stopwords like “the”,
“to”, “not” or “is” appear much more often than others but carry
very little information. Building dictionary metadata for them is
expensive and usually useless. It could be helpful to skip these
stopwords without sacrificing correctness in matching a phrase
like “to be or not to be” that contains only stopwords. Secondly,
when searching for misspelled words, the current design does
not take missing whitespaces into account. Consider the word
“georgebush”. PPS would return something like “Georgetown”
while a better match might be “George Bush”, which would be
found by inserting whitespaces into the keyword. Similarly, we
can also consider combining words. Again, the challenge here
is to find the right granularity to balance between accuracy and
[1] William B. Frakes and Ricardo A. Baeza-Yates, editors.
Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992.
[2] Yuhua Li, Zuhair A. Bandar, and David McLean. An
approach for measuring semantic similarity between words
using multiple information sources. IEEE Transactions on
Knowledge and Data Engineering, 15(4):871-882, 2003.
[3] Web Scraper Lite.
[4] Saul B. Needleman and Christian D. Wunsch. A general
method applicable to the search for similarities in the aminoacid sequence of two proteins. Journal of Molecular Biology,
48(3):443-453, March 1970.
[5] Lawrence Philips. The double metaphone search algorithm. C/C++ Users J., 18(6):38-43, 2000.
[6] Vijay V. Raghavan and S. K. M. Wong. A critical
analysis of vector space mode for information retrieval (5):
279-287, 1986.
[7] G. Salton, A. Wong, and C. S. Yang. A vector space
model for automatic indexing. Commun. ACM, 18 (11):613
-620, 1975.
[8] Gerard Salton and Christopher Buckley. Term-weighting approaches in automatic text retrieval. In Information
Processing and Management, pages 513-523, 1988.
[9] Gerard Salton and Michael J. McGill. Introduction to
Modern Information Retrieval. McGraw-Hill, Inc., New
York, NY, USA, 1986
Table 3.4: Running time of different types of searches
Table3.4 shows the running time of all six different types of
searches. It was no surprise that the simplest text-based
searches took the least time while the sound-based misspelled
word and phrase searches took the longest. It is worth
mentioning that the maximum average search time is merely
over a second and all types of searches have a small standard
deviation comparing to its average time. Combining Table 3.3,
we conclude that the system has behaved reasonably fast and
stable with an over 80% successful rate regardless of the
various types of inputs.
Hybrid Client-Server Multimedia Streaming Assisted by Unreliable Peers
Samuel L. V. Mello, Elias P. Duarte Jr.
Dept. Informatics – Federal University of Parana – Curitiba, Brazil
E-mail: {slucas,elias}
Stream distribution is one of the key applications of the
current Internet. In the traditional client-server model the
amount of bandwidth required at the streaming source can
quickly become a performance bottleneck as the number of
users increases. Using peer-to-peer networks for distributing streams avoids the traffic concentration but, on the other
hand, poses new challenges as peers can be unreliable, presenting a highly dynamic behavior, leaving the system at
any time without previous notice. This work presents a hybrid strategy that uses the set of clients as an unreliable
P2P network to assist the distribution of streaming data. A
client can always return to the server whenever peers do not
work as expected. A system prototype was implemented and
experimental results show a significant reduction of the network traffic at the content source. Experiments also show
the behavior of the system in the presence of peer crashes.
In this work we propose a client-server multimedia
streaming strategy that employs a P2P network for reducing the bandwidth requirements at the server. Typically, a
server transmit the data to the several clients in parallel. As
clients receive the content, they build a peer-to-peer stream
distribution network and exchange different pieces of the
received data. This approach is similar to the one employed
by BitTorrent [4]. The server delivers different parts of the
data to the clients, each of which is eventually able to obtain the whole content by exchanging its received data with
other clients that received different parts of the same content.
1. Introduction
In live multimedia streaming systems the content is produced at the server in real time and must be available to the
clients within a maximum time limit, in order to be successfully used by an application that consumes the data at a
fixed rate. In this case “old” data is not useful for clients.
This implies that the variety of parts that are delivered to
clients that start receiving the content at roughly the same
time is small, as each client typically keeps data spanning
at most a few minutes in its local buffer.
The efficient and reliable distribution of live multimedia
streams to a very large number of users conneted to a wide
area network, such as a corporate WAN or the Internet itself, is still an open problem. As the traditional client-server
paradigm does not offer the required scalability, peer-topeer (P2P) networks have been increasingly investigated as
an alternative for this type of content distribution [1, 3, 6, 7].
In the client-server model, the server is the central data
source from which data is obtained by clients. All transmissions necessarily demand that clients connect to the server,
which can easily become a performance bottleneck. On
the other hand, the client-server model also has advantages,
such as the ability of the content owner to control the delivery, for instance blocking some clients or employing a
policy-based delivery strategy. This property is especially
welcome when different clients have different quality of service requirements. Several on-line providers currently deliver live audio and video streams using tools based on the
client-server paradigm.
The proposed multimedia content distribution system is
actually hybrid, as it presents characteristics both of the
client-server and P2P paradigms. The server is responsible for creating the content, temporarily storing it in a local buffer, splitting the buffer in small parts and transmitting these parts to a group of clients. These clients interact
among themselves and with other clients that did not receive the data directly from the server to set up forwarding
agreements to exchange the received data. The forwarding agreements are refreshed from time to time to adjust to
changes in the system. We assume that the the P2P system
is dynamic and has very low reliability, as clients acting as
peers can leave the system at any time. As the client buffer
must be ready for playback within a bounded time interval,
whenever a peer from which a client was supposed to obtain
content fails or leaves the system, the client can obtain that
piece of content directly from the server and within a large
enough time frame so that data playback is not compromised. The proposed approach was implemented as a prototype, and experimental results are described showing the
improvement on the bandwidth requirements at the server,
as well as the system robustness in the presence of peer failures and departures.
Related work includes several P2P multimedia streaming strategies, such as [2, 3] that are modified versions of
the BitTorrent protocol for continuous stream distribution;
[5] also presents a system that is also similar to BitTorrent
but employs network coding on the stream. Another common approach is to rely on a multicast tree for delivering
the stream, such as in [6]. Some strategies such as [1] focus
on VoD (Video on Demand), in which the user can execute
functions on the stream such as pausing or fast forwarding. Usually pure P2P strategies do not offer QoS guarantees, such as [7] in which peers are selected based on their
measured availability. The proposed approach is different
from pure P2P strategies because the server remains responsible for eventually sending the stream if it is not received
by clients from peers. Furthermore the server still interacts
with all clients at least once every at every round in which
peer agreements are established.
The rest of this paper paper is organized as follows. In
section 2 the proposed strategy is described. Section 3
presents the implementation and experimental results. Section 4 concludes the paper.
2. The Proposed Hybrid Streaming Strategy
The proposed system has the basic components of the
traditional client-server model. The server generates and
sends the stream to the clients. The clients receive the
stream and play it back to user. They also act as peers exchanging parts of the stream. The server helps clients to find
one another to exchange parts of the data. In this work the
terms client, node and peer are used interchangeably – but
“peer” is more often employed for a client that is using its
upload facilities to send part of the stream to another client.
Each client keeps a list of peers from which it tries try to
retrieve parts of the stream. If, by any reason, a client is
not able to retrieve a given part from any peer, it returns to
the server which provides the missing part. The stream is
divided in slices that have fixed size and are sequentially
identified. Slices are produced and consumed at a fixed
rate. A slice is further divided in blocks which have constant size. A block is identified by the slice identifier and its
offset within the slice. The stream is actually transmitted in
Clients send blocks to their peers according to forwarding agreements they establish. Each forwarding agreement
specifies the transmission of blocks with the same identifier, for a certain number of slices. All forwarding agreements last the same number of slices, and each agreement
starts at slices whose identifier is multiple of this number.
Agreements are established in rounds, a round starts after all
the slices of the previous agreement have been completely
transmitted. The first slice of each agreement is called its
base-slice. Whenever a client is unable to establish agreements with other peers, it still can establish an agreement
with the server itself. Figure 1 depicts the blocks to be
transferred in an example forwarding agreement. In the example, the current node receives from Node X blocks with
identifier 3, starting from slice 20 and an agreement lasts 10
Figure 1. Forwarding agreement example.
Figure 2 shows the transmission of a slice. At the server,
the media producer generates the content which is then divided in slices (1). Each slice has a sequential number and
is further divided in blocks (2). In the example, each slice
is divided in four blocks. The blocks are then transmitted
to the clients (3). This transmission may take place directly
from the server to the client or from a peer. Each block
may take a different route to the client. After receiving the
blocks, the client rebuilds the slice (4) and plays it back to
the user (5).
Figure 2. Blocks form the basic transmission
The rate at which the slices are generated and their size
are configurable parameters of the system. After a slice is
sent to the clients, the server still keeps it stored in a local buffer, where it is available in case the server receives
a retransmission request. The slices are held in this buffer
at least until the next agreement is established. Figure 3
shows an example message flow as a client starts up. To
make it simple, each slice is divided in only two blocks.
When the client initializes, it registers itself at the server
(1). The server accepts the connection and adds the client
to a list of active clients. After that, the server sends back
to the client information about the stream (2). This information is used to configure the client’s playback engine and
may include, for instance, the rate a slice should be consumed at and the size of each block. Afterwards, the client
sends a request for information about other peers and the
server replies with a set of peer identifiers randomly chosen
from the active client list (3). Besides helping clients to find
each other, the server also sends blocks to directly to several
clients. In particular, after a new client starts up, the server
creates forwarding agreements for all parts of the slice. The
agreements are valid until in the next round of agreements
the client can try to establish agreements with other peers
(4); in this way a new client quickly starts receiving the
Trip Time) to the client.
The clients that establish forwarding agreements directly
with the server for a given base-slice are selected before a
new round of agreements start. In this way, peers can establish the new agreements before the next base-slice gets
transmitted. After choosing the peer that will receive the
spontaneous agreements, the server notifies these peers with
information about the blocks they are going to receive. The
server then notifies all connected clients that there are forwarding agreements available for the next round and the
peers can begin the agreement establishment phase as described below.
Figure 4. Monitoring message.
Figure 3. Message flow as a client starts up.
As the stream is generated at the server, it also has to
send all the blocks to a subset of the peers, which then exchange these blocks so that all clients will eventually receive the complete stream. The largest the number of blocks
sent by the server directly to the clients, the easier it is
for each peer to find other peers from which it can receive
the blocks it needs. We say that the server spontaneously
chooses a number of clients with which direct agreements
are established. This number of clients is usually a fraction of the total number of connected clients, and is a configurable parameter. The server can employ different approaches for selecting the clients for those agreements, for
instance the selection can be based on the on RTT (Round
Every client keeps for each block a list of peers from
which the client can try to establish an agreement to receive
the block. This information is also used by a monitoring
procedure. Periodically, each peer exchanges monitoring
messages with the other peers in their lists. These messages
contain approximate estimations of the delay between the
creation of a new slice at the server and the expected arrival
at the peer. As each block may go through a different route,
there are different estimations for each block, as shown in
the example in figure 4. Monitoring messages are padded
so that they have exactly the same size of one block of the
stream, so the peer can measure the time spent to retrieve a
block (monitoring message) from each peer. This measures
are taken at the agreement establishment phase.
When a client receives a notification from the server
that there are forwarding agreements available for the next
round, it creates a new peer list for each block. Each list
is sorted by the estimated time to receive the data from the
peer, computed as the sum of the time informed by the peer
in its monitoring message and the time spent to retrieve the
monitoring message from the peer. Furthermore, the client
starts a timer that shows the end of the agreement establishment phase.
2.1. Agreement Establishment Phase
The client begins the agreement establishment phase
sending a request to the first peer in each block peer list. If
the peer accepts the request, the agreement is complete for
that block until the next round of agreements is started. If
the peer rejects the request, the client sends a request to the
next neighbor in the list. The peers that reject the request
are moved to the tail of the list, so another request is sent
again if no other peer replies positively. As a node cannot
accept an agreement to forward a block for which it does
not have itself an established agreement, the delay between
two requests may be enough for the peer to have the agreement and thus be able to to accept the request. This process
is repeated until the client obtains forwarding agreements
for all blocks or a timer expires showing that the agreement
phase is over. If at the end of the agreement phase the client
was unable to establish forwarding agreements for a block,
it sends a request directly to the server. In this case, the
client also sends a request for information about more peers
to expand its block lists, in order to have a larger number of
peers to try to establish agreements with in the next round,
increasing the chance of success.
When a client receives an agreement request, it checks
whether it has already established an agreement for the
specified block and base-slice with another peer or the
server itself. If there is such agreement, the client checks the
number of peers with which it has already accepted agreements to forward the block to. Each client accepts at least a
maximum number of forwarding agreements. If this number has not been reached yet, then the request is accepted,
otherwise it is rejected. This maximum number to agreements that a peer can accept is a configurable parameter of
the algorithm. Figure 5 shows an example of the agreement
establishment phase wich is executed by every client after
it receives the notification from the server. In the example,
node A received an spontaneous agreement from the server
for block 1, node Y for block 3 and node Z for block 2.
Afterwards, node A creates (1) a peer list for block 2 and
another peer list for block 3. Node A sends an initial request for block 2 to peer Y (2), which is unable to accept
the request as it does not have an agreement to receive that
data. The request is then sent to node Z, the next in the list
(3). Node Z accepts the agreement and node A adds the
agreement (4) to its list of established agreements.
Figure 5. Agreement establishment.
2.2. System Behavior in the Presence of Peer
Crashes or Departures
The slices received by the clients are stored in a buffer,
from where they are consumed by the playback engine at a
constant rate after an inicial delay. This initial delay gives
a certain flexibility on the arrival times of different blocks.
Before consuming a slice, the client must ensure that all
blocks have been correctly received. If a block is missing,
for example because the peer the block was supposed to
come from has crashed or left the system, the client requests
the missing block directly to the server. This process must
be performed early enough so that all missing blocks can be
retrieved from the server.
Figure 6. System behavior in the presence of
peer crashes.
Figure 6 shows an example. A peer crashes and does
not send blocks with a certain offset after slice 7. Slices
from 8 to 10 are thus incomplete. Before consuming each
slice, the client requests the missing blocks to the server,
and only consumes the slice after the missing blocks are received. This procedure is repeated for all slices until the
next round, in which the client will establish new forwarding agreements.
3. Experimental Results
This section describes an implementation of the proposed hybrid system and experimental results. The system
was implemented in Java, with all messages exchanged by
peers modeled as Java objects that are serialized and transmitted over TCP/IP connections. All network operations are
handled by a wrapper class that also allows artificial bandwidth limits to be set and collects statistics on the amount
of data sent and received. These artificial bandwidth limits
allow the simulation of several client instances running at
same host. All experiments involved the transmission of a
128 Kbps stream. A slice is composed of 8 blocks with 4
KBytes each, these blocks are played back each 2 seconds.
Forwarding agreements last for 10 slices and the server
makes spontaneous forwarding agreements with 40% of the
clients after sending the fifth slice of each agreement. The
# Agrmts
Pure Client-Server
8 Peers
5637 KB
5620 KB
6125 KB
6380 KB
6880 KB
23040 KB
16 Peers
8049 KB
10803 KB
13486 KB
16186 KB
18853 KB
46080 KB
32 Peers
12809 KB
15249 KB
17982 KB
20710 KB
23297 KB
92160 KB
3.1. The Reduction of Server Bandwidth Required
The first experiment shows the bandwidth requirement
at the server as the number of spontaneous agreements with
clients vary. From 1 to 5 agreements were established with
8, 16 and 32 clients. The system was executed in each case
for 180 seconds. The results were measured at the server
and consider all data sent, including control messages. In
all cases, the playback interruption rate is below 1%, that is,
more than 99% of the slices were available for playback at
the expected time. Results are shown in figure 7 and table
1. For the varying number of clients, the total amount of
data sent by the server is shown. The last column shows the
minimum amount of data that would be transmitted using a
pure client-server approach.
16 Peers
3639 KB
3523 KB
3366 KB
3201 KB
3029 KB
32 Peers
3943 KB
3805 KB
3692 KB
3657 KB
3620 KB
Table 2. Number of bytes sent by peers given
the server spontaneous agreements.
Table 1. Server bandwidth requirement given
the number of spontaneous agreements.
number of times the same data was transmitted by the server
was varied in the experiments. All clients were configured
to have an artificial bandwidth limit of 1024 Kbps. The experiments were run several times and the results shown are
representative of the values obtained.
The rest of this section describes three experiments, in
which the following metrics were evaluated: (1) the reduction of server bandwidth required, (2) the influence of the
number of copies sent by the sever on the upload bandwidth
requirement at the clients, and (3) the system behavior in
the presence of peer crashes.
8 Peers
3313 KB
3302 KB
3297 KB
3317 KB
3292 KB
3.2. Peer Upload Bandwidth Given Versus Spontaneous Server Agreements
The largest the number of spontaneous agreements the
server establishes with the clients, the less data the peers
need to upload. The graphic in figure 8 and table 2 show
the average amount of data peers upload as the number of
spontaneous agreements established by the server vary. Figures are an average of the amount data sent by each peer
and include control and monitoring messages. It is possible to note that for 16 and 32 clients there is a slight reduction in the upload requirements. For 8 clients the value
remained nearly constant, as the copies sent by the server
are distributed among only 40% of the clients. Using these
parameters, it is possible to see that a system with only 8
clients does not take advantage of the additional copies sent
by the server.
Figure 8. Number of bytes sent by peers given
the server spontaneous agreements.
3.3. System Behavior in the Presence of Peer
Figure 7. Server bandwidth requirement given
the number of spontaneous agreements.
The third experiment shows the system behavior in the
presence of peer crashes. The system had the same configuration of the previous experiment, and the server established 3 spontaneous forwarding agreements. The network
was composed of 32 peers of which 16 randomly crashed at
a different instants of time. Clients who were supposed to
4. Conclusion
receive data from a peer that crashed used the retransmission mechanism to request the missing blocks to the server.
The system was able to keep the playback interruption rate
below 1% at all working clients, even in the presence of
failures of half of all peers, in other words, more than 99%
of the slices were available when needed. The retransmission of missing blocks increased the bandwidth usage at the
server, as shown in figure 9 and discussed below.
This paper presented a hybrid client-server multimedia
streaming system which is assisted by an unreliable P2P
network formed by clients. Clients receiving the content
establish forwarding agreements for parts of the stream,
avoiding the concentration of network traffic at the source.
Clients can always return to the server after they are unable
to retrieve the stream from a peer. Experiments show that
the system provides an expressive reduction in the bandwidth requirements at the server and that the system is at
the same time able to support peer crashes and departures
without a significant interruption of the playback at working clients.
Future work include allowing clients to have different
QoS requirements and also different bandwidth limits for
downloading and uploading data. In the proposed strategy,
clients are randomly selected as peers, this can be improved
for instance by using a location aware peer selection strategy. The prototype we implemented allowed basic experiments to be performed showing that the strategy does work
as expected, nevertheless large scale experiments must still
be run comparing the system with other pure P2P streaming
systems, with also check the limits on the system scalability.
Data Sent by the Server in Presence of Failures
Data Sent by the Server (KB)
Time (Slices Produced)
Figure 9. Required bandwidth at the server in
the presence of peer crashes.
In figure 9 it is possible to note that a large amount of
data is transmitted from the beginning up to the creation of
slice 10. This high volume reflects the initial data sent by
the server to new clients after the connection is established.
In the experiment, all clients started at the same time in the
very beginning of the run. Nevertheless, in real world scenarios, the clients are expected to connect at different time
instants, and this reduces the amount of data the server has
to send to all clients simultaneously.
[1] Y. Huang, T. Fu, D. Chiu, J. C. Lui, C. Huang, “Challenges, Design and Analysis of a Large-Scale P2P-VoD
System,” SIGCOMM Comput. Commun. Rev., Vol. 38,
No. 4, 2008.
[2] N. Carlsson, D. L. Eager, “Peer-assisted On-demand
Streaming of Stored Media Using BitTorrent-like Protocols,” IFIP/TC6 Networking, 2007.
[3] P. Shah, and J.-F. Paris, “Peer-to-Peer Multimedia
Streaming Using BitTorrent,” IEEE Int. Performance,
Computing and Communications Conference, 2007.
Clients begin to take advantage of the forwarding agreements after the creation of slice 10. Then, the system
presents a stable behavior up to the point a peer crashes,
which occurs close to slice 45. During this time, it is possible to note some small peaks caused by monitoring and control messages. After the period where the system was under
the effect of the failure, it is possible to note that those peaks
are smaller, as the number of active clients exchanging control and monitoring messages with the server has reduced.
[4] B. Cohen, “Incentives Build Robustness in BitTorrent,”
Workshop on Economics of Peer-to-Peer Systems, 2003.
[5] C. Gkantsidis and P. Rodriguez, “Network Coding for
Large Scale Content Distribution,” INFOCOM, 2005.
[6] D. A. Tran, K. A. Hua, T. Do, “ZIGZAG: An Efficient
Peer-to-Peer Scheme for Media Streaming,” 22nd Annual Joint Conf. of the IEEE Comp. and Comm. Societies, 2003.
Although the crash occurs close to slice 45, but the effects on the server are observed only close to slice 48.
The reason for this delay is that clients were still playing
back previously buffered data. When the buffer gets nearly
empty, the clients request missing blocks to the server. The
experiment continues and peers keep on crashing until slice
60, when a new round of agreements will take place.
[7] X. Zhang, J. Liu, B. Li, T. P. Yum. “CoolStreaming/DONet: A Data-driven Overlay Network for Peerto-Peer Live Media Streaming,” INFOCOM, 2005.
Visual Programming of Content Processing Grid
Pierfrancesco Bellini, Ivan Bruno, Paolo Nesi
DISIT-DSI, Distributed Systems and Internet Technology Lab
Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Firenze, Italy , [email protected], [email protected]
as those needed to compose different services and
Alternatively, the construction of scalable applications
could be done by using workflow of services [4], [5],
[6], [7]. These solutions are based on Workflow
Management Systems, WfMS, and languages due to
the relationships between the Grid solution and the
Workflow solutions. On such grounds, they are
unsuitable for semantic processing. Therefore, a tool to
define visually the activity flow while combining basic
processes and integration aspects of communication,
flow processing, communication, data processing and
semantic processing in a grid scalable environment,
can be a valid help in the development of a new kind
of Web 2.0 and new media applications to satisfy
semantics processing and on-demand needs. Among
end-user grid programming tools [12], we can cite, the
Java Commodity Grid Toolkit (Java CoG Kit) that was
created to assist in the development of applications
using the well-known Globus Toolkit [11]; it is based
on a workflow programming thanks to a XML
language and the Karajan workflow engine. The
GAUGE is another tool developed to work with
Globus [8]. It generates full application code and
allows users to focus on higher level abstraction
avoiding low-level details.
The programming of GRID for content processing
is a quite complex activity since it involves capabilities
of semantics processing and reasoning about
knowledge and descriptors of content, users,
advertising, devices, communities, etc., and
functional/coding data processing in an efficient
manner on a Grid scalable structure. In this paper, the
formal model and tool to visually specify rule
programming on grid is presented. The tool has been
developed on the basis of the AXMEDIS framework
and grid tool, while it can be extended to support other
formalisms generating processes for other grid
1. Introduction
With the introduction of User Generated Content,
UGC, the back offices for content processing based on
grid solution fulfilled the need to be more intelligent,
flexible and scalable to satisfy quickly growing
applications such as the back office activities of social
networks. The grid computing provides high
performance applications and resources widely
distributed. These functionalities are becoming
mandatory for web portals and end-users’ applications.
End–users’ Grid are frequently used by non skilled
users with no professional background on computer
programming. For them, building or modifying grid
applications is a difficult and time-consuming task. To
build new applications, end users need to deal with
excessive details of low-level APIs that are often
platform-specific and too complex for them [1].
Some programming strategies and methodologies were
proposed to encourage grid for end-users. The Problem
Solving Environment (PSE) or portal [2], [3], makes
the use of the Grid easier by supplying a repository of
ready-to-use applications that can be reused by
defining different inputs. Grid complexities are hidden,
thus allowing only simple tasks (e.g., job submissions,
status job checking ). This solution does not provide
the required flexibility to create real applications such
1.1 Visual Processing for Media
Visual programming for media on Grid has to be able
to formalize and represent concurrence of activities
and the logic of services with an end-user oriented
solution to simplify the development of complex
applications. The visual tools have to be designed to
help grid users to develop the application processes
hiding the complexity and the technologies used
(coding, access to databases, communications, coding
format, parallel allocation, etc.). This kind of solutions
are not only useful for Web 2.0/3.0 applications but
also for many other massive applications.
In order to solve the above mentioned problems
related to the visual programming tools for media on
grid processing; a solution has been defined and
validated on the AXMEDIS grid and model, starting
from the AXMEDIS framework code of AXCP grid.
The AXCP grid allows the formalization of processes
for cross media content processing, semantic
processing, content production, packaging, protection
and distribution and much more [9], [10]. The work
reported in this paper is related to the experience
performed in defining a visual language for the
formalization of visual media processing for Grid
environments. The created visual model and tool can
be adopted in other Grids as well.
tools are supported by a Plugin technology. The AXCP
Rule language features allow to perform activities of
ingestion, query and retrieval, storage, adaptation,
extraction and processing descriptors, transcoding,
synchronisation, fingerprint estimation, watermarking,
indexing, summarization, metadata manipulation and
mapping via XSLT, packaging, protection and
licensing in MPEG-21 and OMA, publication and
distribution via traditional channels and P2P.
AXCP Rules can be programmed by using the so
called Rule Editor which is an IDE (Integrated
Development Environment). The Rule editor is too
technical to be used by non programmers, such as
those that have to cope with the definition of content
processing flow and activities in the content
production factories.
2. AXCP Grid framework overview
The AXCP grid comprised of several Executors
allocated to several processors for executing content
processes is managed by a Scheduler. AXCP processes
are called Rules, and are formalized in an extended
JavaScript [9], [10]. The processes can be directly
written in JS and/or the JS can be used to put in
execution other processes. The Scheduler performs the
rule firing, discovering grid Executors and managing
possible problems/errors. The Scheduler (see Figure 1)
may receive commands (to invoke sporadic or periodic
rules with some parameters) and provide reporting
information (e.g., notifications, exceptions, logs,
etc…). The Scheduler exposes a Web Service which
can be called by multiple applications such as web
applications, WfMS, tools, and even other grid Rules
on nodes of the AXCP.
3. The design of the Grid Visual Designer
Before starting with the development of the visual
language for the AXCP, we performed a detailed
analysis of all the AXCP rules developed and collected
in the last three years by several users of AXCP
( ). From the analysis, it has
been discovered that Rules collected and analyzed
(about 280) were for the:
A. 75% single rules with a linear structure, presenting
a sequence of activities to be performed. For each
of them, when one of the activity fails the whole
rule execution has to fail. To this category belong
rules for automated content production on
demand, licensing, content publication and/or
repurposing, etc. These rules may have or may not
have to report a result to the calling process which
requested the execution of the Rule.
B. 9% rules activated by other Rules on the Grid in
asynchronous manner. Their mother rule does not
need to wait for the result to continue its running.
These rules, even if they are activated by another
Rule, are structurally realized as rules of type A.
Since they start asynchronously and do not keep
blocked the main rule as well.
C. 16% rules activating/invoking other processing
Rules by creating synchronous/asynchronous
derived Rules waiting/or-not for their completion
to continue their execution.
Besides, we have discovered that almost all rules
present JS segments of functional blocks working on
single or on lists of content elements performing
specific activities. This analysis allowed us to identify
a possible semantics for a visual programming
language based on composition of processing
segment/blocks. Thus the Visual Program defined
allows to compose:
Figure 1 -- AXCP Architecture ©
The Executors receive the Rule to be executed
from the Scheduler, and perform the initialization and
the launch of the Rule on the processor. During the
run, the Executor could send notifications, errors and
output messages. Furthermore, the Executor could
invoke the execution of other Rules sending a specific
request to the Scheduler. This solution gains the
advantages of a unified solution and allows enhancing
the capabilities and the scalability of the AXMEDIS
Content Processing [9], [10]. The AXCP processing
functions defined in the same JavaScript (AXCP rule
body). Skilled users may create their own Java Script
JSBlocks augmenting their library.
single elements of the process (called JSBlock) to
create composed Rules allocated on the same
processor node (covering rule of type A and B)
x branching activities (collection of RuleBlocks)
which are allocated and executed on the Grid
infrastructure according to their dependency by
the scheduler. The Rules capable to activate other
Rules cover the specific semantic of rules of type
C, identified in the analysis.
The composition of these two models plus the
implementation of a set of ready to use functional
blocks (JSBlock or RuleBlocks) allowed to cover the
issues mentioned in section 1.1 regarding: hierarchical
structure, internal visual programming of single
process flow on the single executing node, complex
and branched flows composed by several different
processes allocated on different grid nodes, error code
reporting,, visual processing of media.
3.2 Single Rule Visual Programming
JSBlocks can be combined to define the steps of a
process flow corresponding to a grid processing rule, a
RuleBlock (see Figure 2). The execution of JSBlocks
is a sequential process flow according to the Boolean
result returned by the previous one. Therefore, the
process can take only one direction and end in one and
only one of the leaves. The visual editor displays a
green arrow for true and a red one for false. In this
paper, we use a dashed arrow for false. The JSBlocks
can be selected from the library of JSBlock including
the functions listed in section 3.1. A single JSBlock
can be quite complex; for example, it could activate
other RuleBlocks which are processes on the grid, thus
creating recursive or iterative patterns.
3.1 Modeling JSBlocks, Single Elements
According to the above reported analysis, a collection
of visual blocks organized into a common repository
and divided into categories has been created such as:
Querying, ingesting, Posting, Metadata processing,
Taking decision engine, Adapting/transcoding,
manipulation, Utility. In our visual programming
model a generic block can be a segment of a JavaScript
rule (a JSBlock) or a full RuleBlock (which in turn is
created as a set of JSBlock or directly coded in
JavaScript). JSBlocks can be composed by connecting
inputs and outputs, according to their types; where
each data value is an array that may contain a single
element or a list of referred content or metadata
elements. A JSBlock is characterized by a name (type
name and instance) and a set of in/out parameters. A
parameter can be marked as: (i) IN when it is
consumed into the Rule, (ii) OUT when it is consumed
into the rule and can be used to pass back a result to
the next processing segments, that is IN/OUT, (iii)
SETUP when it is a reserved INput to set up block
specific behaviour. This parameter type is used to
force different operating conditions in the Block. For
example, to pass the ID of the database to be used, the
temporary directory, etc.
Semantically speaking, a JSBlock is traduced into a
JS Procedure specifying an elementary processing
activity. Parameters are typed (String, Real, Integer
and Boolean) and arrays of data have to be modelled as
a string containing a list of formatted items by using
specific separators. The JSBlock are connected one
another according to their signature and arguments.
The entry point function itself can invoke different
Figure 2 – A sequence of JSBlocks
creating RuleBlock SB22
The visual programmer creates the specification with
drag and drop approach, connecting blocks and
imposing, through dialog boxes, the in/out parameters
of a JSBlock (either by editing constant values or
linking them with parameters of other blocks. In
particular). JSBlocks composition is based on the
forward and backward parameter propagation with the
aim of creating a RuleBlock which is a rule to be
allocated on a single node on the grid. The
propagation allows linking the input parameters of a
JSBlock with the IN/OUT parameters of its parents. In
reference to the figure 2, input parameters of JSBlock
D and E could be linked to IN/OUT parameters of B
and A JSBlocks, whereas the JSBlock C sees only
those of A. Backward propagation allows the
definition of the IN/OUT parameters of the created
RuleBlock by marking the input parameters of a
JSBlock as global IN or OUT of the container
Semantically speaking, the code generator starts by
parsing the JSBlock sequence to produce a single
RuleBlock by assembling all the JSBlocks in a single
RuleBlock, including the JS code and the maps of
parameters among JSBlocks. Finally the IN and OUT
parameter definitions are created and assigned to the
signature of the RuleBlock implementing the calls
chain. Finally, the resulting RuleBlock, JS Rule, is
activated on the grid according to its parameters.
At a first glance, the visual programming semantics of
RuleBlocks seems to have relevant limitations, but it is
not fully true, as put in evidence in the following. In
fact, it should be considered that on the basis of the
above described model the: (i) iterations are internally
managed into the single JSBlocks. (ii) decisions can be
taken into the single JSBlock. A JSBlock can be
regarded as a visual implementation of a selection
and/or of a sequence of actions. On the other hand, the
above semantics does not address the modeling of
multiple branches into the graphs, and thus the
management of multiple rules/processes.
example, one of the child of ZRule is an instance
of the ZRule, invoked by A1 with some
Regarding IN/OUT parameter management and
editing, a ManRuleBlock follows the same semantics
of the RuleBlock thanks to the forward and backward
parameter propagation. Please note that the definition
of ManRuleBlock can be recursive as depicted in
Figure 3, in which ZRule is calling via A1 another
instance of ZRule.
Semantically speaking, the code generator produces
the a JS Rule implementing the ManRuleBlock (e.g,
ZRule in example of Figure 3) for managing the
activation of other RuleBlocks according to the graph
and always respecting the assignment of parameters of
the RuleBlocks. Please note that RuleBlocks are
activated by using a web service calls of the Scheduler.
The code generator produces the a Rule called ZRule
which is the invoker and also the manager for IN/OUT
parameters, waiting for the answers/results of the
called RuleBlocks to pass them to the others according
to the flow.
3.3 Managing RuleBlocks Rules on grid
The visual programming model proposed has a specific
modality to specify branched activations of rules on
the grid, by delegating to the grid scheduler the
effective allocation of processes on nodes. To this
purpose, a different visual model/semantics, with
respect to previous one, has been defined. It allows the
construction of branched and distributed rules. In this
case, see Figure 3, the visual graph is a tree represents
a set of processes and their activation relationships, as
depicted in Figure 3.
4. Example of visual programming grid
In this section, an example of visual programming is
reported. It implements an audio recognition process
based on fingerprint, to recognize audio track on the
basis of audio fingerprint, for examples when the audio
are uploaded on a portal to filter out User Generated
Content, UGC, infringing copyrights.
The first step has been the identification of basic
procedures involved in audio file searching, fingerprint
extraction, database insertion and searching. The
JSBlocks have been used to compose and generate
RuleBlocks. Combining them, by linking parameters,
the following RuleBlocks were built:
A) FingerprintExtraction (in::folderIn, in::fileExt,
in::folderOut). It uses getFileList, extractFingerprint
and alert Jsblocks (see Figure 4). The fileList input
parameter of “fingerprint_extraction” procedure is
associated with the filePathList out parameter of
“fileList”. The folderIn, fileExt (getFileList) and
folderOut (extractFingerprint) are back-propagated as
input parameter for the “FingerprintExtraction”
Figure 3 – ZRule ManRuleBlock defined to manage
Rules on grid.
A Managing RuleBlock (ManRuleBlock) ZRule is
created, for example, to activate the execution of
RuleBlocks A1 and B2 in a sequential way (or
parallel) on the grid and to return on the process ZRule
synchronously (or asynchronously) their return
parameters. Even in this case, the single RuleBlock can
be selected from a library or can be created by using
x visual model of section 3.2, see figure 2;
x AXCP Editor for JavaScript as single JS Rule to
be used as a block in the library of rules;
x Another ManRuleBlock, defined by using Visual
programming model depicted in Figure 3 (for
Figure 4 – RuleBlock FingerprintExtraction
new RuleBlock. The fileExt of “FingerprintExtraction”
is associated with the fileExt of new RuleBlock in
order to define the wildcard for audio files to get all
the files stored in the folderIn.
SearchFingerprintInDatabaset RuleBlock runs when
a unknown audio file has to be identified by searching
its fingerprint inside the database.
This new RuleBlock uses (see Figure 7):
x FingerprintExtraction(in::folderIn, in::fileExt,
x SearchIntoDB(in::folderIn, in::fileExt,
in::resultFilePath, in::dbID)
B) InsertIntoDB (in::folderIn, in::fileExt)
This RuleBlock uses getFileList, insertFingerprint,
alert Jsblocks (see Figure 5). The fileList input
parameter of “insertFingerprint” procedure is
associated with the filePathList out parameter of
Figure 5 – RuleBlock InsertIntoDB
The folderIn and fileExt (getFileList) are backpropagated as input parameter for the “InsertIntoDB”
C) SearchIntoDB (in::folderIn, in::fileExt,
in::resultFilePath, in::dbID).
It uses getFileList, searchFingerprint and alert Jsblocks
(see Figure 6). The fileList input parameter of
“searchFingerprint” procedure is associated with the
filePathList out parameter of “getFileList”. The
folderIn, fileExt (getFileList), resultFilePath and dbID
(searchFingerprint) are back-propagated as input
parameters for the “SearchIntoDB” RuleBlock.
Figure 7 – ManRuleBlock
The presented Rule Blocks were used inside the
AXMEDIS GRID environment to populate a
fingerprint database starting from a large collection of
audio tracks. The fingerprint extraction algorithm
works on mp3 and wave audio formats normalizing the
audio features (sample rate, number of channels and
bit per sample) when necessary and generates a
fingerprint by using the Spectral Flatness descriptor.
The production of the experiments have been quite fast
and simple for the visual programmer. Several other
ManRuleBlocks have been defined to replicated via
visual programming real grid rules provided by
Figure 6 – RuleBlock InsertIntoDB
By using the ManBlockRule Visual Programming
RuleBlocks were used to build the more complex
AddNewFingerprint is put in execution every time
new audio files are added in the repository, then it
extracts fingerprints, and inserts them into database.
This new RuleBlock uses the following:
x FingerprintExtraction(in::folderIn, in::fileExt,
x InsertIntoDB(in::folderIn, in::fileExt)
(FingerprintExtraction) are back-propagated as input
parameters. The folderIn input of “InsertIntoDB” rule
and the folderOut input of “FingerprintExtraction” are
both associated with the folderOut parameters of the
The visual language and semantic model proposed
allowed to cope with almost all grid patterns identified,
and ranging from sequential to parallel execution,
asynchronous and synchronous invocations, recursive
and iterative, etc. The approach resulted quite effective
and usable. It has been strongly appreciated since the
users can customize the single JSBlock, and may
create their own Blocks according to the specific
application domain in which they have to work.
Additional work for the modeling is needed to manage
versioning of the visual elements and to allow the
semantic search of the Blocks into the database of
reusable blocks. Presently, the search of the most
suitable blocks is supported by a table that represents
all the main features crossing the input and outputs
parameters with the main data item processed by the
single block. We have noticed that the reuse of blocks
is mainly performed at level of JSBlocks.
ManRuleBlocks and RuleBlocks are quite frequently
versioned, adding more parameters and pushing them
to become more general to be reused in several
occasions, this can be a problem since the same Block
can be used by several
[1] Zhiwei Xu, Chengchun Shu, Haiyan Yu, Haozhi Liu,
“An Agile Programming Model for Grid End Users”
Proceedings of the Sixth Int. Conf. on Parallel and
Distributed Computing, App. and Tech. (PDCAT’05)
[2] Special Issue: Grid Computing Environments.
Concurrency and Computation: Practice and
Experience, 14:1035-1593,2002.
[3] J. Novotny. The grid portal development kit.
Concurrency and Computation: Practice and
Experience, 14:1129-1144, 2002.
[4] E. Akarsu, F. Fox, W. Furmanski, and T. Haupt.
WebFlow – high level programming environment and
visual authoring toolkit for high performance distributed
computing. In Proceedings of the 1998 ACM/IEEE
Conference on Supercomputing, pages 1-7, 1998.
[5] R. Armstrong, D. Gannon, A. Geist, K. Keahey, S.
Kohn, L. McInnes, S. Parker, and B. Smolinski. Toward
a common component architecture for high performance
scientific computing. In Proc. Of 8th. IEEE Int. Symp.
on High Performance Distributed Computing, 1999.
[6] E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta,
K. Vahi, K. Blackburn, A. Lazzarini, A. Arbree, R.
Cavanaugh, and S. Koranda. Mapping abstract complex
workflows onto grid environments. Journal of Grid
Computing, 1:25-39,2003.
[7] M. Lorch and D. Kafura. Symphony – A java-based
composition and manipulation framework for
computational grids. In Proc. of the 2nd IEEE/ACM
International Symposium on Cluster Computing and the
Grid (CCGrid2002), pages 136-143, 2002.
[8] F. Hernández, P. Bangalore, J. Gray, Z. Guan, and K.
Reilly. GAUGE: Grid automation and generative
environment. Concurrency and Computation: Practice
and Experience, to appear. 2005.
[9] P. Bellini, I. Bruno and P. Nesi, “A Distributed
Environment for Automatic Multimedia Content
Production based on GRID”, in Proc. AXMEDIS 2005,
Florence, Italy, 30/11-2/12, pp134-142, IEEE Press.
[10] P. Bellini, I. Bruno, P. Nesi, ``A language and
architecture for automating multimedia content
production on grid'', Proc. of the IEEE International
Conference on Multimedia & Expo (ICME 2006), IEEE
Press, Toronto, Canada, 9-12 July, 2006.
[11] F. Hernández, P. Bangalore, J. Gray, and K. Reilly. A
graphical modeling environment for the generation of
workflows for the globus toolkit. In V. Getov and T.
Kielman, editors, Component Models and Systems for
Grid Applications. Proceedings of the Workshop on
Component Models and Systems for Grid Applications
held June 26, 2004 in Saint Malo, France, pages 79-96.
Springer, 2005.
[12] Minglu Li, Jun Ni, Qianni Deng, Xian-He Sun, “Grid
and cooperative computing”: second international
workshop, GCC 2003, Shanhai [sic], China, December
7-10 2003 : revised papers, Springer, 2004, ISBN
3540219889, 9783540219880
5. Conclusions
In this paper, a visual programming model for content
processing grid has been proposed. It has been
designed for general grid processing and implemented
for validation on AXCP grid open solution. The
featured have been identified by analysing a large set
of real processing grid rules. The derived model
satisfied the 97% of them. On the other hand, the code
generator allows to access the code adjusting the
uncovered missing 3% of rules. A remodelled rules
have been tested against the optimized manually create
rules. In most cases the rule visually produced present
lower performance of the original one. A further work
is in process to add some more constructs that may
enables the visual programmer to manage with more
simplicity the errors recovering in the ManRuleBlock
semantics and in defining rules that are activated by
multiple firing conditions. Presently these issues are
managed at the level of code with specific JSBlocks
while in some cases they are constructs that should be
visible at the higher level of grid programming. The
full documentation can be recovered on the AXMEDIS
The authors would like to thank all the AXMEDIS
partners for their contributions. Most of the work
reported has not been performed after the AXMEDIS
project completion and offered to the framework that is
still growing.
Interactive Multimedia Systems for
Technology-Enhanced Learning and Preservation
Kia Ng,1 Eleni Mikroyannidi,1 Bee Ong,1 Nicolas Esposito2 and David Giaretta3
ICSRiM - University of Leeds, School of Computing & School of Music, Leeds LS2 9JT, UK
Costech and Heudiasyc Labs, University of Technology of Compiègne and CNRS,
Centre Pierre Guillaumat, 60200 Compiègne, France
STFC, Rutherford Appleton Laboratory, Oxfordshire OX11 0QX, UK
[email protected]
“signals” will be mapped to multimedia content for
generation using a mapping strategy (see Figure 1).
An example of an IMP process is the one adopted in
the MvM (Music via Motion) interactive performance
system, which produces music by capturing user motions
[1, 6].
Interactive multimedia systems have been applied in a
wide range of applications in this context. This paper
presents two interactive multimedia systems that are
designed for technology-enhanced learning for music
performance (one for string instruments playing and one
for conducting) and interactive multimedia performance,
and consider their preservation issues and complexity.
interaction technologies are effecting and contributing
towards a wide range of developments in all subject areas
including contemporary performing arts. These include
performance, installation arts and technology-enhanced
learning. Consequently, the preservation of interactive
multimedia systems and performances is becoming
important to ensure future re-performances as well as
preserving the artistic style and heritage of the art form.
This paper presents two interactive multimedia projects
for technology-enhanced learning, and discusses their
preservation issues with an approach that is currently
being developed by the CASPAR EC IST project.
Technology-enhanced learning, Motion capture, sensor,
multimodal, Digital Preservation, Ontologies.
1. Introduction
Interactive multimedia technologies and all forms of
digital media are popularly used in contemporary
performing arts, including musical compositions,
installation arts, dance, etc. Typically, an Interactive
Multimedia Performance (IMP) involves one or more
performers who interact with a computer based
multimedia system making use of multimedia content.
This content may be prepared and generated in real-time
and may include music, manipulated sound, animation,
video, graphics, etc. The interactions between the
performer(s) and the multimedia system can be done in a
wide range of different approaches, such as body motions
[1, 2], movements of traditional musical instruments,
sounds generated by these instruments [3, 4] , tension of
body muscle using bio-feedback [5], heart beats, sensors
systems, and many others. These “signals” from
performers are captured and processed by multimedia
systems. Depending on specific performances, the
Figure 1: Interactive Multimedia Performance process
Generally, manipulating/recording multimedia content
using computers is an essential part of a live interactive
performance. Using simply performance outputs recorded
in the form of audio and video media will not be
sufficient for a proper analysis (e.g. for studying the
effect of a particular performing gesture on the overall
quality of the performance) or reconstruction of a
performance at a later time. In this context, traditional
music notation as an abstract representation of a
performance is also not sufficient to store all the
information and data required to reconstruct the
performance. Therefore, in order to keep a performance
alive through time, not only its output, but also the whole
production process to create the output needs to be
The remaining paper is organized as follows. Section 2
presents two Interactive Multimedia Performance
Systems that need to be preserved. Section 3 introduces
the conceptual model of the CASPAR project and the
tools that are used for the preservation of the IMP
systems. Finally the paper is concluded in section 4 and
the next steps of future work are outlined.
Interactive Multimedia
Systems (IMP)
The 3D Augmented Mirror is designed to support the
teaching and learning of bowing technique, by providing
multimodal feedback based on real-time analysis of 3D
motion capture data. Figure 2 shows a screenshot of the
3D Augmented Mirror interface, including synchronized
video and motion capture data with 3D bowing
When practicing using AMIR, a student can view the
posture and gesture sequences (3D rendering of the
recorded motion data) as prepared by the teacher,
selecting viewpoints and studying the recording without
the limitations of a normal 2D video. A student can also
make use of the system to capture and study their own
posture and gesture, or to compare them with some
selected models.
2.1. 3D Augmented Mirror (AMIR)
The 3D Augmented Mirror (AMIR) [7, 8, 9] is an IMP
system being developed in the context of the i-Maestro
( project, for the analysis of gesture
and posture in string practice training. String players
often use mirrors to observe themselves practicing. More
recently, video has also been used. However, this is
generally not effective due to the inherent limitations of
2D perspective views of the media.
Playing an instrument is physical and requires careful
coaching and training on the way a player positions
himself/herself with the aim to provide the best/effective
output with economical input, i.e. least physical effort. In
many ways, this can be studied with respect to sport
sciences to enhance performance and to reduce self
inflicted injuries.
With the use of 3D Motion Capture technology, it is
possible to enhance this practice by online and offline
visualising of the instrument and the performer in a 3D
environment together with precise and accurate motion
analysis to offer a more informed environment to the user
for further self-awareness, and computer assisted
monitoring and analysis.
Figure 3: Gesture signature – tracing gesture for the
analysis of composition.
It has been found that the AMIR multimodal recording
which includes 3D motion data, audio, video and other
optional sensor data (e.g. balance, etc) can be very useful
to provide in-depth information beyond the classical
audio visual recording for musicological analysis (see
Figure 3). Preservation of the IMP system is of great
importance in order to allow future re-performance. The
multimodal recoding offers an additional level of detail
for the preservation of musical gesture and performance
that can be vital for the musicologist of the future. These
contributions have resulted in our motivation for the
preservation of the AMIR multimodal recordings.
2.2. ICSRiM Conducting Interface
The ICSRiM Conducting System is another IMP
system developed for the tracking and analysis of a
conductor’s hand movements [10, 11]. Its aim is to help
students learning and practicing conducting.
Figure 2: Graphical Interface of the 3D Augmented Mirror
3. Preservation
Preserving the whole production process of an IMP is
a challenging issue. In addition to the output multimedia
contents, related digital contents such as mapping
strategies, processing software and intermediate data
created during the production process (e.g. data translated
from “signals” captured) have to be preserved, together
with all the configuration, setting of the software, changes
(and time), etc. Both Multimedia Systems presented on
section 2, generate similar type of datasets. The dataset
usually consists of the captured 3D motion data, video
and audio files, MAX/MSP patches and additional
configuration files. The reproduction of the IMP can be
achieved through the correct connection of these
components. Therefore, the most challenging problem is
to preserve the knowledge about the logical and temporal
relationships among these individual components so that
they can be properly assembled into a performance during
the reconstruction process.
Another important aspect that needs to be preserved is
also the comments and feedbacks that are generated from
the users or performer during the production of an IMP
and regard the quality of the performance and the used
techniques. In the context of the CASPAR project, we
have adopted an ontology-driven approach [13-15] that
reuses and extends existing standards, such as the CIDOC
Conceptual Reference Model (CIDOC-CRM) [16, 17] for
the efficient preservation of an IMP.
Figure 4: Wii-based 3D capture setup.
A portable motion capture system composed by
multiple Nintendo Wiimotes is used to capture the
conductor’s gesture. The Nintendo Wiimote has several
advantages as it combines both optical and sensor based
motion tracking capabilities, it is portable, affordable and
easily attainable. The captured data are analyzed and
presented to the user in an entertaining as well as
pedagogically informed manner highlighting important
factors and offer helpful and informative monitoring for
raising self awareness that can be used during a lesson or
for self-practice. Figure 5 shows a screenshot of the
Conducting System Interface with one of the four main
visualization mode.
3.1. Conceptual Model of CASPAR
The CASPAR framework is based on the full use of
the OAIS (Open Archival Information System) Reference
Model [18], which is an ISO standard. The OAIS
conceptual model is shown in Figure 6. The Conceptual
Model aims to provide an overall view of the way in
which the project sees preservation working. Also the
conceptual model helps to highlight the areas which can
help to the formation of an interoperable and applicable
structure that can support effectively the digital
preservation across the different CASPAR domains.
The very basic concept defined in the OAIS Reference
Model is the Information Object. As illustrated in the
UML diagram of Figure 6, an Information Object is
composed of a Data Object and one or more layers of
Representation Information. A Data Object can be a
Physical Object (e.g. a painting) or a Digital Object (e.g.
a JPEG image). Representation Information provides the
necessary details for the interpretation of the bits
contained within the digital object into meaningful
information. For digital objects, representation
information can be documentation about data formats and
Figure 5: Graphical Interface of the ICSRiM Conducting
structures, the relationships amongst different data
components. Representation information can also be
software applications that are used to render or read the
digital objects.
). The Cyclops tool is used to capture appropriate
Representation Information from a high level in order to
enhance virtualization and future re-use of the IMP. It
also offers the ability of adding comments and
annotations concerning any concept of the IMP. Figure 8
shows the Graphical interface of the Cyclops tool and
how it is used to create an IMP description. The tool
provides a palette for creating the description of an IMP
as a graph in the drawing area.
Web Services
Figure 6: Basic concepts of OAIS Reference Model –
Figure 7: The Architecture of the ICSRiM Archival
Information Object [18]
In addition, the Representation needs to be connected
with the Knowledge base of the designated community.
Ontology models offer the means for organizing and
representing the semantics of this knowledge base.
Figure 9 shows in detail the graphical instantiation of
an IMP that was created with the use of the Cyclops tool.
The graph can capture information about the software and
hardware that was used as well as the components that
were produced (e.g. 3D motion data) and how they are
linked these components for the reproduction of an IMP.
The concepts of the diagram shown in Figure 9 can be
mapped to the concepts of the CIDOC-CRM and FRBR
ontologiesError! Reference source not found..
However, the usable interface of the tool hides the
complexity of the system from the user. It uses a simple
high level language (concepts, relations, and types) which
is based on the terminology of the domain and does not
require any ontology expertise to create the instantiation.
The Cyclops canvas offers a graphical representation of
the life cycle to make its understanding easier.
Furthermore, Cyclops is a Web application, facilitating
the portability. It is open source and it uses the following
technologies: XUL, JavaScript, SVG, HTML, CSS,
XML, PHP, MySQL. Cyclops can be used as an
integrated component of the ICSRiM Archival System as
well as a standalone application.
The retrieval of an IMP is based on queries that are
applied on the Knowledge Base. In particular, the Web
Archival calls the FindingAids services, which task is to
perform RQL queries on the Representation Information
Objects and return the results to the user. Every
Representation Information object is linked to a
corresponding dataset of an IMP stored in the Repository.
3.2 The ICSRiM Archival System
The Archival System has been developed by the
University of Leeds and it is used for the access, retrieval
and preservation of different IMPs. The architecture of
the Archival system is based on the OAIS conceptual
model and on the CASPAR Framework. In addition, the
Archival system integrates the appropriate CASPAR
( as web services for the efficient preservation of
the IMP.
The architecture of the Archival system is shown in
Figure 7. It has been designed in order to support the
preservation of different types of IMPs. Thus, it can be
used for both the 3D Augmented Mirror and the
Conducting System.
The archival system provides a web interface and its
backend communicates with a Repository containing the
IMPs and the necessary metadata for preserving the
IMPs. Before the ingestion of an IMP, it is necessary to
create its description based on the CIDOC-CRM and
FRBRoo ontologies. This information is generated in
RDF/XML format with the use of the CASPAR Cyclops
Therefore, the user will be able to retrieve the IMP files
s/he is interested in and their description.
learning procedure as it provides ways of capturing
feedbacks and comments on the quality of the IMP. It
also helps to preserve the intangible heritage that an IMP
We are currently working on the deployment of the
CASPAR components within the Archival System. In
particular, we are integrating software tools such as the
Semantic Web Knowledge Middleware [19], for
performing Information Retrieval tasks that will facilitate
the exploitation of our knowledge base.
5. Acknowledgements
Work partially supported by European Community
under the Information Society Technologies (IST)
programme of the 6th FP for RTD - project CASPAR.
The authors are solely responsible for the content of this
paper. It does not represent the opinion of the European
Community, and the European Community is not
responsible for any use that might be made of data
appearing therein.
The research is supported in part by the European
Commission under Contract IST-026883 I-MAESTRO.
The authors would like to acknowledge the EC IST FP6
for the partial funding of the I-MAESTRO project
(, and to express gratitude to all IMAESTRO project partners and participants, for their
interests, contributions and collaborations.
The authors would also like to acknowledge David
Bradshaw for his work on the development of the
ICSRiM Conducting Interface.
Figure 8: The graphical interface of the Cyclops tool.
6. References
Figure 9: An IMP instantiation created with the Cyclops
4. Conclusions and Future Work
The paper presented the CASPAR Conceptual model
and the tools that are used for the preservation of
interactive multimedia performances. The approach of the
project considers ontologies as a semantic knowledge
base containing the necessary metadata for the
preservation of IMPs.
The design of the system offers flexibility in
preserving multiple IMP systems. In addition, the
preservation of the IMP Systems could enhance the
K. C. Ng, "Music via Motion: Transdomain
Mapping of Motion and Sound for Interactive
Performances," Proceedings of the IEEE, vol. 92,
R. Morales-Manzanares, E. F. Morales, R. B.
Dannenberg, and J. Berger, "SICIB: An Interactive
Music Composition System Using Body
Movements," Computer Music Journal, vol. 25, pp.
25-36, 2001.
D. Young, P. Nunn, and A. Vassiliev, "Composing
for Hyperbow: A Collaboration between MIT and
the Royal Academy of Music," in International
Conference on New Interfaces for Musical
Expression, Paris, France, 2006.
D. Overholt, "The Overtone Violin," in
International Conference on New Interfaces for
Musical Expression, Vancouver, BC, Canada, 2005.
Y. Nagashima, "Bio-Sensing Systems and BioFeedback Systems for Interactive Media Arts," in
2003 Conference on New Interfaces for Musical
Expression (NIME-03), Montreal, Canada, 2003.
[6] MvM,
[7] K. Ng, Technology-Enhanced Learning for Music
with i-Maestro Framework and Tools, in
Proceedings of EVA London 2008: the International
Conference of Electronic Visualisation and the Arts,
British Computer Society, 5 Southampton Street,
London WC2E 7HA, UK, ISBN: 978-1-906124-076, 22-24 July 2008.
[8] K. Ng, Interactive Feedbacks with Visualisation and
Sonification for Technology-Enhanced Learning for
Music Performance, in Proceedings of the 26th
ACM International Conference on Design of
Communication, SIGDOC 2008, Lisboa, Portugal,
22-24 September 2008.
[9] K. Ng, O. Larkin, T. Koerselman, B. Ong, D.
Schwarz, and F. Bevilaqua., "The 3D Augmented
Mirror: Motion Analysis for String Practice
Training," in Proceedings of the International
Copenhagen, Denmark, 2007.
[10] D. Bradshaw, and K. Ng, Tracking Conductors
Hand Movements using Multiple Wiimotes, in
Proceedings of the International Conference on
Automated Solutions for Cross Media Content and
Multi-channel Distribution (AXMEDIS 2008), 1719 Nov. 2008, Florence, Italy, pp. 93-99, Digital
Object Identifier 10.1109/AXMEDIS.2008.40,
IEEE Computer Society Press, ISBN: 978-0-76953406-0. 4.
[11] D. Bradshaw and K. Ng, Analyzing a Conductor’s
Gestures with the Wiimote, in Proceedings of EVA
London 2008: the International Conference of
Electronic Visualisation and the Arts, British
Computer Society, 5 Southampton Street, London
WC2E 7HA, UK, 22-24 July 2008.
[12] K. Ng, T.V. Pham, B. Ong, A. Mikroyannidis, D.
Giaretta, Preservation of interactive multimedia
performances, International Journal of Metadata,
Semantics and Ontologies 2008 - Vol. 3, No.3 pp.
183 – 196, 10.1504/IJMSO.2008.023567
K. Ng, A. Mikroyannidis, B. Ong, D. Giaretta,
Practicing Ontology Modelling for Preservation of
Proceedings of the International Conference on
Automated Solutions for Cross Media Content and
Multi-channel Distribution (AXMEDIS 2008), 1719 Nov. 2008, Florence, Italy, pp. 276-281, Digital
Object Identifier 10.1109/AXMEDIS.2008.43,
IEEE Computer Society Press, ISBN: 978-0-76953406-0.
K. Ng, T. V. Pham, B. Ong, A. Mikroyannidis, and
D. Giaretta, "Ontology for Preservation of
Interactive Multimedia Performances," in 2nd
International Conference on Metadata and
Semantics Research (MTSR 2007), Corfu, Greece,
A. Mikroyannidis, B. Ong, K. Ng, and D. Giaretta,
(MELECON'2008), Ajaccio, France, 2008.
T. Gill, "Building semantic bridges between
museums, libraries and archives: The CIDOC
Conceptual Reference Model," First Monday, vol.
9, 2004.
M. Doerr, "The CIDOC CRM - an Ontological
Approach to Semantic Interoperability of
Metadata," AI Magazine, vol. 24, 2003.
Consultative Committee for Space Data Systems,
"Reference Model for An Open Archival
Information System," 2002.
D. Zeginis, Y. Tzitzikas, and V. Christophides, "On
the Foundations of Computing Deltas Between RDF
Models," in 6th International Semantic Web
Conference (ISWC-07), Busan, Korea, 2007.
LoCa – Towards a Context-aware Infrastructure for eHealth Applications∗
Nadine Fröhlich1
Marco Savini2
Andreas Meier2
Heiko Schuldt1
Thorsten Möller1
Joël Vogt2
Department of Computer Science, University of Basel, Switzerland
Department of Informatics, University of Fribourg, Switzerland
{nadine.froehlich, thorsten.moeller, heiko.schuldt}
{andreas.meier, marco.savini, joel.vogt}
New sensor technologies, powerful mobile devices and
wearable computers in conjunction with wireless communication standards have opened new possibilities in providing customized software solutions for medical professionals and patients. Today, medical professionals are usually equipped with much more powerful hardware and software than some years before. The same is true for patients
which, by making use of smart sensors and mobile devices
for gathering, processing and analyzing data, can live independently in their home environment while receiving the
degree of monitoring they would get in stationary care. All
these environments are highly dynamic, due to the inherent
mobility of users. Therefore, it is of utmost importance to
automatically adapt the underlying IT environment to the
current needs of their users – which might change over time
when user context evolves. In a digital home environment,
this requires the automatic customization of user interfaces
and the context-aware adaptation of monitoring workflows
for mobile patients. This paper introduces the LoCa project
which will provide a generic software infrastructure, able to
dynamically adapt user interfaces and services-based distributed applications (workflows) to the actual context of a
user (physician, caregiver, patient). In this paper, we focus
on the application of LoCa to monitoring the health state of
mobile patients in a digital home environment.
1. Introduction
Telemonitoring applications enable healthcare institutions to control therapies of patients in out-of-hospital settings. In particular, telemonitoring allows patients to live
as independently as possible in their digital home environ∗ The
LoCa project is funded by the Hasler Foundation.
ment. The goal is to support the individual disease management by patient monitoring which will result in less hospitalization and a higher quality of life. In the presence of
an increasingly aging population and a growing number of
people suffering from chronic ailments, this kind of applications already has a high relevance for the healthcare system
and is expected to gain even more importance.
Monitoring includes the continuous gathering, processing and analysis of mainly physiological data coming from
sensors which are either integrated into the patient’s digital
home or attached to the patient’s body or clothes. Currently,
these monitoring applications are rarely automated. Configuration of the sensor environment, the customization for a
particular patient, and the actual data processing and analysis are mostly tedious manual tasks. In the LoCa project
(A Location and Context-aware eHealth Infrastructure), we
aim at providing a user-friendly and adaptable solution for
the automated gathering and analysis of relevant data for
monitoring patients. LoCa will be a general purpose system
that can be applied both in digital home environments and in
stationary care. A main feature in LoCa is the consideration
of context as a first class citizen. This means that monitoring applications and processes as well as user interfaces will
be dynamically adapted based on the user’s context (e.g.,
location, activity, etc.). Context-aware adaptations will result in more customized monitoring solutions and thus better support for data analysis and emergency assistance (e.g.,
triggering of emergency services in case of severe health
conditions). Dynamic adaptations will also allow to seamlessly apply best practices in health monitoring and patient
control without explicit reconfigurations.
Consider, for instance, a sixty-five year old male patient
with cardiac problems in convalescence. During his recovery at home, his physician would like to control his state of
health and therefore needs to continuously receive data on
his physiological condition. At the moment, the patient’s
ECG is measured periodically once a day or additionally, in
case the patient does not feel well. For this, a nurse is sent
to the patient’s home to record ECG data and other measurements. The physician only receives raw data and has to
manually initiate all the steps needed for the interpretation
of raw data in a particular order, including a comparison of
the actual values with the patient’s medical history, to determine the individual development of physiological data.
In order to improve this situation, the patient is given a
smart shirt equipped with several sensors metering physiological parameters like ECG and blood glucose level. In
addition, the patient receives a smart phone with GPS sensor and camera. From the point of view of the patient, this
allows for almost unlimited mobility and does no longer require him to stay at home for the necessary measurements.
From the physician’s point of view, the smart shirt allows
for the continuous gathering of vital parameters and thus for
seamless monitoring in real time. As an important requirement for properly analyzing and interpreting metered data,
the physician needs to know the exact context of the measurement (e.g., the patient’s location and activity). Therefore, the shirt not only has to provide physiological data but
also details on his activity (e.g., by means of acceleration
sensors that can monitor the physical exercises he is doing).
The patient’s therapy includes a healthy diet, without alcohol and cigarettes, as well as physical exercises he is not
used to. Thus, he writes an electronic diary, extended with
photos of his meals, which finally helps in communicating
diet information and stress factors to his physician. Annotations to this diary, provided by the physician, support the
patient in understanding effects of his behavior for his therapy. Having access to raw sensor data does not yet allow
the physician to properly analyze the patient’s health state.
The data still has to be cleaned, eventually coarsened, and
analyzed in correlation with each other. For data analysis,
the physician will follow a process consisting or dedicated
processing steps in pre-defined order. To ease her work she
will use the LoCa system to define these workflows in a
user-friendly way, thereby determining rules for data interpretation. Finally, she is able to define proper thresholds,
for instance for critical blood pressure values in stress situations.In case a threshold is exceeded, the physician will
be visually advised on her screen or will receive an SMS. It
is important to note that neither the analysis processes nor
the corresponding user interfaces are static but need to be
automatically adapted as soon as the context of the patient
changes (e.g., when a different set of sensors is available),
or in the course of the therapy when further parameters need
to be taken into account.
The objective of the LoCa project is to address the challenges introduced above and provide reliable support for
workflow-based eHealth applications. This includes telemonitoring in home care as well as applications in stationary care. In close collaboration with stakeholders from the
healthcare domain, different use cases from both applications have already been defined. Finally, the LoCa system
will be applied and evaluated in a stationary care and in
a home care environment by the medical project partners.
In this paper, we focus on telemonitoring applications in a
digital home environment. From a functional perspective,
the goal is to gather, process, analyze, and visualize physiological data and to store aggregated data in the electronic
health record of a patient. In particular, the analysis and
visualization will be dynamically tailored to the patient’s
context. This includes sophisticated failure handling which,
by considering context at run-time, does not need to be prespecified in monitoring workflows. The system should finally be able to detect and anticipate potential cardiac irregularities or other health-related problems, based on criteria
defined by the medical partners in the project. From a systems point of view, LoCa will make use and extend an existing platform for the reliable processing of data streams for
health monitoring across fixed and mobile devices [5, 6].
In this paper, we present the ongoing LoCa approach to
context-aware monitoring applications in digital homes. An
important constraint in this scenario is that users (patients)
are mobile, which means their context might frequently
change. Therefore, the way data — coming from different
soft- or hardware sensors — is analyzed needs to be automatically adapted, if necessary. The same is true for the
interaction of the user with the system. The basis of these
adaptations is a powerful context model and its exploitation
to dynamically adapt i.) user interfaces and services and ii.)
process-based distributed applications (workflows).
The remainder of this paper is organized as follows: Section 2 introduces the LoCa context model. The architecture
of the LoCa system is presented in Section 3. In Section 4,
we discuss context-aware adaptation in LoCa. The status of
the current implementation is presented in Section 5. Section 6 surveys related work and Section 7 concludes.
2. Context Model
LoCa exploits a generic context model to improve health
care applications and to facilitate the treatment of patients,
both in home care and in stationary care. To reach this
goal, we need to adapt processes and user interfaces automatically according to the current context. This, in turn,
necessitates the proper representation of context information. We have designed a generic context model for context
data management. Figure 1 depicts this model in EntityRelationship notation. In here, we closely follow the well
established definition of context by Day et al. [1]: Context
is any information that can be used to characterize the situation of a subject. A subject is a person, place, or object
that is considered relevant to the interaction between a user
and an application [...].
Figure 1. LoCa Context Model
The Subject can be a patient, a mobile phone, or an ECG
sensor. Conversely, profile data, the medical history, current
ECG data, or the current location are examples for context
information about a patient. The entity Context Object represents the actual context data, e.g., the value of the current
location, a document of the medical history, and so on. In
order to support data analysis, we store optional meta data
about context objects, such as time stamps and data accuracy (which usually depends on the type of sensor used).
The entity Data Generator (humans, hardware sensors,
software sensors) is designed to capture data about the instrument (sensor) which produces context data: a data generator generates context data about subjects. While many
data generators generate atomic data, some sensors may
produce compound context objects. For instance, the (GPS)
location usually consists of multiple values, such as longitude, latitude, altitude, speed, and bearing. Furthermore,
software sensors can combine different kinds of context objects to compose higher level context data. An alarm in
case of cardiac problems could be combined of information about the current activity of a patient and his current
ECG values. This is covered in the model by means of the
relationship logical combination.
The context model is able to handle different kinds of
context objects, including nested context objects. An important feature of the context model is its rather simple, yet
expressive structure. It is powerful enough to cover all the
different context objects that have been identified in the requirements analysis phase of LoCa in which several home
care and stationary care use cases have been analyzed together with stakeholders from the eHealth domain. Nevertheless, the model can be extended by adding new data
generators and thus also new context objects, if necessary.
Figure 2. LoCa Conceptual Architecture
it needs to be cleaned and transformed into the global
schema. Since context data is a vital input for all LoCa
applications, the context data management layer forms the
basis of the LoCa architecture depicted in Figure 2.
On top of context management, the LoCa applications
are defined as workflows. The basic assumption is that
functionality is available in the form of (web) services so
that workflows can be defined by combining existing services. Since complete workflows again have a service interface, service composition can be applied recursively. A
crucial part of this layer is dynamic workflow adaptation.
This layer makes use of the raw sensor data and their relationships stored in the context layer. The top-most layer
of the LoCa architecture deals with the dynamic generation
and adaptation of user interfaces. Again, this layer directly
accesses the underlying context data management.
All layers are embedded in the LoCa infrastructure
which is described in more detail in Section 5. The LoCa
architecture offers a unified interface for (individual, userdefined or pre-existing) workflow based applications. According to the context model, LoCa workflow-based applications themselves can be considered software sensors, i.e.,
they might produce context objects which are subsequently
needed for dynamic adaptation.
3. Architecture of the LoCa Platform
4. Context-aware Adaptation in LoCa
Context awareness requires that the information gathered
from distributed sensors is stored in a global, albeit distributed database on the basis of the schema presented in
Sec. 2. Prior to inserting raw sensor data into this database,
In what follows, we address the dynamic adaptation
needed in LoCa for applications in the eHealth domain,
namely at workflow (process) and at user interface level.
4.1. Context-aware Workflows
Traditional approaches to workflow management usually
consider static settings as they can be found in business processes or office automation. However, these approaches are
far too rigid to handle highly dynamic environments as they
can occur in the medical domain, especially when monitoring mobile patients in their (digital) home environment.
From a workflow management perspective, these applications are characterized by a potentially large number of i.)
exceptions or unforeseen events (e.g., abnormal deviations
in sensed physiological data that may require alternative
medication); ii.) different ways to achieve a goal (e.g., different devices can be used to meter blood pressure); iii.)
decisions only decidable at run-time (e.g., results of tests
cause different subsequent tests or treatments); and iv.) dynamic and continuous changes (e.g., new devices, or treatment methods).
Context-aware, adaptable workflows offer much more
flexibility than traditional workflows as they allow for structural changes based on evolving user context. Basically,
structural changes of workflows can be done at build-time
(prior to the instantiation of workflow processes) and at
run-time (changing an instance of a workflow). Build-time
changes cover evolutionary changes of processes but also
changes caused by context changes like new methods of
treatments, hospital guidelines, laws, etc. These kinds of
workflow changes are not in the primary focus of LoCa. We
will mainly address run-time changes such as, for instance,
allergic hypersensitivity of patients that cause changes in
the treatment process (e.g., adding an allergy test).
There are two kinds of run-time changes [19] — process adaptation and built-in flexibility. Process adaptation,
that can be performed at run-time, is based on modification
operations like add, delete, or swap of process fragments.
Built-in flexibility supports the exchange of process fragments of a workflow. For instance, assume the examination
of a special disease differs depending on the age of the patient because the risk to get this disease and its severity increases with the age of the patient. Thus, the examination
always follows the same basic structure while the concrete
steps depend on the patient’s risk group. Therefore, a workflow consisting of placeholders and concrete steps is defined
at build-time. Steps that differ depending on the age are defined as placeholder activities and steps that not differ as
usual activities. At run-time, placeholder activities are replaced by the concrete fragments depending on the patient’s
risk group.
Variants of built-in flexibility are described in [19].
Three of them are of particular importance for the eHealth
applications in LoCa: i.) late selection, ii.) late modeling,
and iii.) late composition. They differ in the degree of decision deferral and need for user experience. The least flex-
ibility is offered by late selection where workflows, defined
at build-time, contain placeholder activities that are substituted by a concrete implementation during run-time. Late
modeling additionally supports modeling of placeholder activities at run-time. The most flexible pattern is late composition. At build-time, only process fragments are specified. At run-time workflows are composed out of the process fragments available. In LoCa, we will adopt late composition and will make use of the services’ semantics (using
semantic web service standards) for the actual selection.
Applied to the scenario presented in Section 1, the treatment workflow has to be adapted dependent on the vital
parameters of the patient. Assume that the therapy is less
successful than expected so that the physician decides to
also meter the blood pressure of the patient. In this case
the workflow for controlling the patient’s health state has
to be extended accordingly. Usually, the physician is informed about irregularities in the patient’s ECG values by
visually highlighted values and, if severe problems occur,
by an SMS to his mobile phone. The extension to a new
sensor requires also the adaptation of the signal processing
and triggering.
In LoCa, we focus on run-time changes of workflows
without manual intervention. Particularly, we will provide
rules for automated adaptation of workflows, that is, automated fragment selection or composition based on user context and service semantics.
4.2. Context-aware User Interfaces
Adapting user interfaces in a context-aware environment
allows the various actors of the system the best possible utilization of the available resources. Therefore simply defining one standard user interface (UI) design and adapting it
to the display of the device the user is currently using will
not be sufficient [10, 13, 21].
In LoCa, each user interface component (i.e., button,
pulldown menu, picture) will be described in an artifact and
be interpreted at run-time. This generic description contains
the type of the component, its position within a hierarchy, a
mapping to the environment that allows listening to incoming information and a label.
Another artifact with a set of rules is responsible for
mapping the generic composite to a concrete representation for a given situation. This rendering mechanism is executed at run-time in order to choose the currently most optimal way to display the component. It takes into account
the following contextual information: i.) device: information about the current device, such as displaying capabilities, current network bandwidth and latency, CPU usage,
remaining battery time, etc. This might be a mobile device
of the patient or any device of the patient’s digital home environment; ii.) user: who is using the current device. This
information may also cover several users, such as the doctor
and a patient during a ward visit; iii.) location: the current
location of the device may also influence the rendering of
a component; iv.) reason: the reason why a component is
displayed may be difficult to obtain. Possible elements of
such an information may be the current calendar entries or
tasks, the current patient situation, such as ECG; v.) time:
the dimension time is not simply a timestamp, but may also
include time spans or semantical information, such as “after
lunch” or “night”.
For the application scenario presented in Section 1, this
means for instance that the patient’s mobile device knows,
by making use of the calendar stored on it, that a specific
process needs to be started. The device displays the input fields required to enter the required physiological parameters. If the input field for the blood oxygen saturation
value is able to find a viable hardware sensor in its proximity (oximeter), it automatically reads the value from that
device, sets itself immutable and moves to the bottom of
the display. The mandatory input fields that cannot be processed automatically must be filled in by the patient. Each
input component must also decide how to react if, for example, the patient fills in a value before it could find a matching
hardware sensor in its environment.
5. Implementation
The implementation of the LoCa infrastructure is currently ongoing. LoCa will use and further advance
the open service-oriented infrastructure OSIRIS1 N EXT
(ON)2 . Originally based on the hyperdatabase vision [16],
many ideas from process management, peer-to-peer networks, database technology, and Grid infrastructures were
integrated in the past in order to support distributed and decentralized process management [18]. More recent work
aims at i.) support for distributed data stream management [5, 6] and ii.) the integration of semantic technologies to enable new ways for flexible and automated process
management support. This includes support for distributed
and decentralized execution of processes in dynamic (mobile) environments [11] as well as an advanced method to
enable automated forward-oriented failure handling [12].
In the context of the LoCa project we will exploit and
extend the process management system that has been integrated into ON. It allows for dynamically distributed and
decentralized execution of composite semantic services that
are described based on OWL-S. On top of this, the user interface will be built based on the Android platform3 .
ON essentially represents a P2P-based open service infrastructure. At its bottom layer it realizes a message1 Open Service Infrastructure for Reliable & Integrated process Support
Figure 3. Screenshot of LoCa Demonstrator
oriented middleware enabling arbitrary services which are
deployed at peers to interact by means message exchange.
Besides the possibility for end-to-end interactions, the platform also realizes a publish-subscribe messaging paradigm.
Furthermore, it incorporates advanced concepts for eager
and lazy data replication, taking into account user specified data freshness properties. The platform provides several built-in system services that are used to manage meta
and runtime information about the services offered by the
peers in the network [18].
ON is fully implemented in Java. One of its key properties is its a small systems footprint (in particular regarding
memory) and its internal design is strictly multithreaded in
order to take advantage of multi-core CPU technology. Every service spawns its own thread group. Internal message
processing is similar to the SEDA approach [20]. It can
be deployed in a stand-alone mode on a wide range of devices, starting from mobile platforms, netbooks, up to enterprise computing machines. Moreover, ON can also be
deployed as an agent in the JADE4 agent platform, thus, enabling FIPA compliant usage.
For evaluation and demonstration of our approach, especially of our use cases, we are building a prototype based
on Android cell phones. Figure 3 shows an early prototype
of the user interface for a physician.
6. Related Work
In the last years, a number of projects have been carried out in the eHealth domain. In particular, many projects
apply workflow and process technology for distributed application in eHealth. Akogrimo [8] deals with the support of dynamic virtual organizations that require the ability to change its structure dynamically and to access data
from mobile resources. ADEPT [15] allows to dynamically
change the type of workflow instances in order to react to
changes in the application (e.g., patient’s therapy). While
ADEPT addresses mainly change patterns, CAWE (Context
Aware Workflow System) [3] deals with built-in flexibility.
A number of eHealth projects also take into account context. The MARC project [2] provides a passive monitoring
system that can be used for elderly people. CodeBlue [7]
explores various wireless applications in the eHealth domain with a focus on 3D location tracking. ARCS [17] addresses user interface adaptation in eHealth applications. It
provides web-based interfaces mainly for stationary devices
for manual disease monitoring. In [4], eHealth applications
and services to support mobile devices have been designed.
Online monitoring and streaming data is more and more
emerging in eHealth. The MyHeart project [14] monitors
cardio-vascular parameters using measuring wearable devices (i.e., devices that are integrated into clothes). The
PHM project [9] measures different vital parameters either
continuously or at determined time intervals.
[4] J. Bardram. Applications of Context-Aware Computing in
Hospital Work – Examples and Design Principles. In Proc.
ACM SAC, pages 1574 – 1579, 2004.
[5] G. Brettlecker and H. Schuldt. The OSIRIS-SE (StreamEnabled) Infrastructure for Reliable Data Stream Management on Mobile Devices. In Proc. SIGMOD, pages 1097–
1099, June 2007.
[6] G. Brettlecker, H. Schuldt, and H.-J. Schek. Efficient and
Coordinated Checkpointing for Reliable Distributed Data
Stream Management. In Proc. ADBIS’06, pages 296–312,
Thessaloniki, Greece, 2006.
[7] T. Gao, D. Greenspan, M. Welsh, et al. Vital Signs Monitoring and Patient Tracking Over a Wireless Network. In Proc.
IEEE EMBS, pages 102–105, 2005.
[8] T. Kirkham, D. Mac Randal, J. Gallop, and B. Ritchie. Akogrimo — a Work in Progress on the Delivery of a Next Generation Grid. In Proc. SOAS’05, 2005.
[9] C. Kunze, W. Stork, and K. Müller-Glaser. Tele-Monitoring
as a Medical Application of Ubiquitous Computing. In Proc.
MoCoMed’03, pages 115–120, 2003.
[10] P. Langley. User Modeling in Adaptive Interfaces. In Proc.
UM’99, pages 357–370. Springer, 1999.
[11] T. Möller and H. Schuldt. A Platform to Support Decentralized and Dynamically Distributed P2P Composite OWL-S
Service Execution. In Proc. MW4SOC’07. ACM, 2007.
[12] T. Möller and H. Schuldt. Control Flow Intervention for
Semantic Failure Handling during Composite Serice Execution. In Proc. ICWS’08, pages 834–835, 2008.
[13] E. G. Nilsson, J. Floch, S. O. Hallsteinsen, and E. Stav.
Model-based user interface adaptation.
Computers &
Graphics, 30(5):692–701, 2006.
[14] M. Pacelli, G. Loriga, N. Taccini, and R. Paradiso. Sensing Fabrics for Monitoring Physiological and Biomechanical Variables: E-textile solutions. In Proc. Int’l Symposium
on Medical Devices and Biosensors, pages 1–4, 2006.
[15] M. Reichert, S. Rinderle, and P. Dadam. On the Common
Support of Workflow Type and Instance Changes Under
Correctness Constraints. In Proc. CoopIS’03, pages 407–
425, Catania, Italy, Nov. 2003. Springer LNCS.
[16] H.-J. Schek and H. Schuldt. The Hyperdatabase Project From the Vision to Realizations. In Proc. BNCOD, pages
207–226, Cardiff, UK, 2008.
[17] G. Schreier, A. Kollmann, M. Kramer, et al. Computers
Helping People with Special Needs, volume 3118, pages 29–
36. Springer, 2004.
[18] C. Schuler, C. Türker, H.-J. Schek, R. Weber, and
H. Schuldt. Scalable Peer-to-Peer Process Management. Int.
J. of Business Process Integration & Management, 2006.
[19] B. Weber, S. Rinderle, and M. Reichert. Change Patterns
and Change Support Features in Process-Aware Information
Systems. In Proc. CAiSE 2007, pages 574–588. Springer
LNCS, Trondheim, Norway, June 2007.
[20] M. Welsh, D. Culler, and E. Brewer. SEDA: An Architecture
for Well-Conditioned, Scalable Internet Services. In 18th
Symp. on OS Principles, Banff, Canada, 2001.
[21] Z. Yu, X. Zhou, D. Zhang, et al. Supporting Context-Aware
Media Recommendations for Smart Phones. IEEE Pervasive
Computing, 5(3):68–75, 2006.
7. Conclusion and Future Work
LoCa is an ongoing effort that will provide a novel approach to context and location-aware eHealth applications
as they can be found when monitoring physiological data
and activity status of patients in a digital home environment.
By providing generic support for the context-aware adaptation of workflows and user interfaces, LoCa is intended to
be applied to other scenarios as well, e.g, in stationary care.
In close collaboration with healthcare practitioners and
experts from industry, we have identified several concrete
scenarios. The requirements coming from there will be considered when completing the implementation of the LoCa
system based on the ON platform. Finally, these scenarios
will be evaluated together with our medical partners.
[1] G. Abowd, A. Dey, P. Brown, et al. Towards a Better Understanding of Context and Context-Awareness. In Proc. Int’l
Symp. on Handheld and Ubiquitous Computing, pages 304–
307, London, UK, 1999. Springer.
[2] M. Alwan, S. Kell, B. Turner, et al. Psychosocial Impact
of Passive Health Status Monitoring on Informal Caregivers
and Older Adults Living in Independent Senior Housing. In
Proc. ICTTA’06, pages 808–813, 2006.
[3] L. Ardissono, R. Furnari, A. Goy, et al. A Framework for
the Management of Context-aware Workflow Systems. In
Proc. WEBIST 2007, pages 80–87, 2007.
An Intelligent Web-based System for Mental Disorder
Treatment by Using Biofeedback Analysis
Bai-En Shie1, Fong-Lin Jang2, Richard Weng3, Vincent S Tseng1,4*
Department of Computer Science and Information Engineering,
National Cheng Kung University, Tainan, Taiwan, R.O.C..
Department of Psychiatry, Chi-Mei Medical Center, Tainan, Taiwan, R.O.C..
Innovative DigiTech-Enabled Applications & Services Institute,
Institute for Information Industry, Kaoshiung, Taiwan, R.O.C..
Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan, R.O.C..
*Correspondence E-mail: [email protected]
Abstract—With the rapid development of the communication
technology, the Internet plays a more and more important role in
many applications of healthcare. In healthcare field, mental
disorder treatment is an important topic, with cognitive
behavioral therapy and biofeedback therapy as two emerging and
noteworthy methods. A number of researches on the integrations
of the Internet and mental healthcare have been proposed
recently. Thus, this research aims at the development of an online
treatment system for panic patients by combining the cognitive
behavioral therapy, biofeedback therapy, and web technologies.
The system provides a more convenient communication between
patients and medical personnel. The essential treatments and
some related information provided by the medical personnel can
be downloaded or used online by the patients via a web-based
interface. On the other hand, important information such as
physiological data can also be uploaded to the server databases
automatically. Therefore, considerable time on the treatments can
be saved for both patients and therapists, and the medical costs
can be highly reduced. The experimental results show that the
curative effects of the mental disorder patients are highly depend
on the physiological status. The results of this research are
expected to provide useful insights for the field of mental disorder
save the time of the patients and therapists while achieving the
goal of treatments, but also reduce the cost of healthcare.
Recently, various mental disorders become more and more
popular in modern societies. Mental disorders are mainly
divided into major and minor disorders. Minor mental
disorders are mainly expressed in affective disorder, such as
anxiety, depression, and thought disorder, such as obsession.
However, the patients’ cognitive thinking, the ability of logic
inference and self-checking ability are generally normal. The
patients of major mental disorders may show anxiety and
obsession in the initial stage, but their cognition will be very
bad with the self-checking ability almost lost. The common
minor mental disorders are such as anxiety disorder,
obsessive-compulsive disorder, depression, and phobia, and on
the other hand, the common major mental disorder is such as
In mental disorders, panic disorder is a kind of chronic
disease. It is a common disease of the cases in the hospital
emergency-rooms. The mental symptoms of the patients with
panic disorders are fear of losing control of themselves,
derealization, depersonalization, and the feeling of impending
Keywords—biofeedback analysis, online mental therapy, mental
death. The physiological symptoms of the patients are
disorder treatment, intelligent healthcare, data mining
dizziness, tyspnea, tachypnea, and palpitations. The patients
will be very fearful and uncomfortable. Some severe patients
will even afraid of going out to avoid appearing outdoors such
Since the rapid development of the electronic as in open space, on bridges, in queues, in cars, in crowd, or
communication technology, the Internet plays a more and more other places which are difficult to escape from people [17]. For
important rule in the domain of medicine, especially in the the patients with panic disorders, the symptoms will be
healthcare field. A number of researches on the integration of repeated constantly and unexpectedly, which make the sufferer
the Internet and the treatments for mental healthcare were feel highly distressed and apprehensive. Therefore, their
proposed [1, 2, 3, 4, 5, 6, 8, 14, 15, 16]. In these researches, the behaviors will be blatantly obviously. They will endeavor the
treatments with the Internet were mostly used for melancholia occasions which they afraid. In the later stage, they will even
and anxiety disorders. In addition, there were also some be melancholy and agoraphobic and these may result in
researches that applied the Internet for the treatments of the decreasing of their family functionalities. The symptoms of
patients with the substance use disorders, such as smoking [4, 6] panic disorders are not easily diagnosed. They are often
and alcoholism [5, 14]. That is, the researchers applied the diagnosed as heart attack or other diseases and the patients may
Internet to cognitive behavioral therapies. This can not only have many unnecessary medical check-ups. These symptoms
not only waste medical resources and delay time limitations for
treatments, but also results in inconveniencing of social and
occupational functionalities of the patients [10, 20].
In view of these, we aimed at building an intelligent mental
disorder treatment system with the integration of cognitive
behavioral therapy, biofeedback therapy and web technologies
in this paper. The main contributions of this paper are as
follows. First, the system provides a convenient interface for
the communication between patients and hospital staffs.
Second, the hospital staffs enable patients to query or
download information via the Internet. Third, the patients can
upload their physiological data and self-rating scales to the
databases of the hospitals via the Internet.
For biofeedback measurements, we used a new biofeedback
device, named emotion ring, as shown in Figure 1 to record the
patient’s finger skin temperature. Different to other
biofeedback devices, the advantages of the emotion ring are
compact size, easy to carry, ease to operation, and wireless data
communication. We applied online progressive muscle
relaxation training combined with the emotion ring measuring
to help patients learn how to relax themselves and alleviate the
symptoms of panic disorder. Once the patients learn the
somatic cues for relaxation and the method to obtain rapid
relaxation, they were able to apply the methods and cues to
relieve the symptoms of panic disorder. Moreover, we used
the proposed online therapy system for the patients to perform
the treatment courses themselves at home. We also requested
them to upload the biofeedback data via the system daily and
whereby therapists could quickly manage patients' latest data.
Furthermore, patients were asked to fill out the self-rating
scales online and upload them for the therapists, so that the
therapists could know the patients' mental status, judge their
curative effect, and give them some necessary feedbacks.
Fig. 1. The biofeedback device: emotion ring.
This paper is the first research for the system of integration
of cognitive behavioral therapy, biofeedback therapy and the
Internet. We expect the system can be used by the patients to
practice biofeedback therapy at home. In this paper, we also
constructed a complete biofeedback online therapy model,
which was composed of cognitive behavioral therapy, data
transmission and storage, and connecting and interacting
between patients and therapists via the Internet. The results are
expected to increase the convenience of mental therapy,
decrease the medical cost, be able to deal with more patients
who need mental therapy and provide a beneficial application
for public health in society and also academia. In the
experiments section, we employ the data to explore the
possibility of giving mental healthcare with physiological data.
We expect that the system can assist the prevention and
treatment of mental disorders by monitoring the physiological
data with real clinical verification.
The rest of this paper is organized as follows. In Section 2,
we summarize the existing researches on panic disorders. In
section 3, we describe the proposed online treatment system for
panic disorders in detail. The performance study of our
research is presented in Section 4. Section 5 is the conclusion
of the paper.
Panic disorder is encountered frequently in general medical
practices and emergency services. The data from National
Comorbidity Survey Replication of the United States showed
the lifetime prevalence estimates are 3.7% of panic disorder
without agoraphobia (panic disorder only), and 1.1% of panic
disorder with agoraphobia [7]. The international lifetime
prevalence rates of panic disorder ranged from 0.13% in rural
village of Taiwan to 3.8% in the Netherlands [18]. This
disorder is rather debilitating to the sufferer, and even causes
depression or suicide [20]. The life quality of the victims of
panic disorder is dismal, and even worse than those with major
depression [10]. The victims of panic disorder also received
more welfare or some form of disability compensation [13].
For public health, the optimal treatment for panic disorder is
an important task to be dealt with. In clinical practice, two
major modalities have been applied to its treatment of panic
disorder: one is pharmacotherapy and the other is
non-pharmacological psychotherapy. For psychotherapy,
cognitive behavioral therapy is the main mode and has been
proved to be effective for symptom management and
prevention of recurrence for panic disorder [17, 21]. Thanks to
the advancement in computing and the Internet,
computer-aided cognitive behavioral therapy has been
employed for more than one decade. It is any computing
system that aids cognitive behavioral therapy to make
computations and treatment decisions [11]. But computer-aided
cognitive behavioral therapy should not only expedite
communication or overcome the problem of distance; it
consists of computation rather than replacing routine paper
leaflets only [12].
Most Internet interventions for mental disorders are
cognitive behavioral programs that are proposed as guided
self-help programs on the Internet. Randomized controlled
studies on the use of Internet interventions for the treatment of
mental disorders are still scarce [15]. From the limited
literature it showed that computer/Internet-aided cognitive
behavioral therapy was superior to waiting lists and placeboes
assignment across outcome measures, and the effects of
computer/Internet-aided cognitive behavioral therapy were
equal to therapist-delivered treatment across anxiety disorders.
However, conclusions were limited by small sample sizes, the
rare use of placebo controls, and other methodological
problems [16].
Treating panic disorder sufferers via the Internet is a rational
concept, not only considering the issue of transportation of
patients but also that of those suffering from agoraphobia. Up
to date, publications about clinical trials of Internet-based
cognitive behavioral therapy for panic disorder were mainly
from Sweden, United Kingdom, and Australia. Carlbring
constructed a cognitive behavioral therapy treatment program
consisting of stepwise intervention modules: psychoeducation,
breathing retraining and hyperventilation test, cognitive
restructuring, interceptive exposure, exposure in vivo, and
relapse prevention [1]. The participants got significant
improvement in all dimensions of measures. They further
compared an Internet-based treatment program with an applied
relaxation program which instructed the participants on how to
relax expediently and applying relaxation techniques to prevent
a relapse into a panic attack [2]. The applied relaxation
condition has a better overall effect compared to the cognitive
behavioral therapy program, and the effectiveness of the two
groups was similar. Recent randomized trials demonstrated that
Internet-based cognitive behavioral therapy for panic disorder
could be as cogent as traditional individual cognitive behavior
therapy [3, 8].
manager is also a therapist, he/she can manage his/her patients
like a therapist does in the system. Besides, the hospital
managers can also manage all therapists in each hospital via the
system. The hospital managers can create new therapists'
accounts by themselves without contacting the database
4) The scenario of the system managers: The system
managers do not actually need to use the system. They just
manage and maintain the system. They can create new
accounts for hospital managers. However, since the treatment
records can not be made arbitrarily public, the system
managers can not see patients' treatment data.
The login roles of the users are shown in Figure 2. In the
figure, we can know the top management of this system is
system manager. He/She can create the account of the hospital
managers. For each individual hospital, there is only one
hospital manager handling all the therapists who use the system
in the hospital. By the way, the therapists can manage all their
own patients via the system.
System manager
Manager of hospital A
In this paper, we integrate our mental disorder therapy
system with the Internet to efficiently collect the biosignal data,
the self-rating scales and the personal profiles of the patients
with mental disorders. In this section, we describe the scenario
and the functions of our proposed online therapy system.
A. User Scenarios
There are four kinds of users in this system: patients,
therapists, hospital managers, and system managers. In the
following, we explain the user scenarios in detail.
1) The scenario of the patients with mental disorders: The
patients with mental disorders use the finger temperature
measurement system and upload the results to the databases
daily. Either weekly or monthly, they need to fill out the
self-rating scales which are provided by the therapists in the
system. The patients can also query their own treatment records
or see the suggestions which were provided by their therapists.
2) The scenario of the therapists: The therapists use the
system to manage the data uploaded by patients. The data are
composed of the finger temperature which is measured by the
patients daily and the self-rating scales which are filled out by
the patients weekly or monthly. The therapists can also reply
some suggestions to the patients after observing the data. When
the patients afterward login the system, they can check the
suggestions via the system conveniently. Besides, the therapists
can create new accounts of patients by themselves without
operations by the database managers. When a patient finishes
the treatment procedure, the therapist can directly close this
case in the online system.
3) The scenario of the hospital managers: If a hospital
Therapist C
Therapist D
Manager of hospital B
Patient E /// Patient F Patient G /// Patient H
Fig. 2. Sketch map of login roles.
B. The techniques about measuring finger temperature
In the following, we describe the communication processes
and methods between the emotion rings and the computers.
First, we install the device driver of the emotion ring. After
installation, the MAC address of the emotion ring and the
detected temperature will be transmitted from the emotion ring
to the USB receiver once a second. When the USB receiver
gets data, it simulates a COM port and transmits the data with
11 bytes. Table I is an example of the transmitted data. The
first byte is fixed as “A3”. The second to the ninth bytes are the
MAC address of the emotion ring. The last two bytes are the
temperature data. The first four MAC address of all emotion
rings are all the same, “001CD902”. The received temperature
data are ten times of the actual temperature.
MAC Address
00 1C D9 02 00 00 00 3B 01 0A
1 Byte
8 Bytes
2 Bytes
The execution environment of the receiving end is Java
applet. The basic libraries of Java do not support the input and
output of the serial ports. User's Java environments will be
detected and the libraries are created. The program for the
receiver needs to search a free COM port for receiving data.
After receiving data, the received information from the last
eleven bytes to the last seven bytes are checked instead of from
the first to the fifth bytes. The reason for this being if some
errors occur during data transmission, the receiver may receive
the data from the middle of the previous data instead of the first
byte of the latest data. So we check from the last of the
received data to avoid any error occurrence.
After checking the received data, the program acquires the
data from COM port once a second, and output the number
which is one-tenth of the last two bytes of data. Table I is an
example of received data. The decimal in the last two bytes of
the data, i.e., 010A, is 266, and its one-tenth is 26.6. This
indicates the temperature which is detected at that time is
26.6к. However, sometimes the USB receiver may not be
given the data due to poor signaling strength. The emotion ring
will be regarded as "not exist" when the program does not
detect any data after three seconds.
the patients can select the functions arbitrarily when they login
to the system. For convenience in using the system and
decreasing confusion for the users, the system details the
options and procedures the user has to complete on that day.
Through the guidance of the system, a patient may use the
system as follows: First, he logs into the system and is
informed by the homepage that he hasn’t completed the daily
treatment course on that day. Then he completes it and uploads
the temperature data to the system database. In the next, he
returns to the homepage and finds that he has a self-rating scale
to complete, so he completes it. After finishing that day’s
necessary tasks, he goes to the pages to see the suggestions
which his therapist has given the previous day, the results of
finger temperature and the self-rating scales are then uploaded
that day. In the end he logs out of the system.
Users login the system
Regular Course:
Filling the self-rating scales
Providing the self-rating
scales which users need
to complete (by system)
Daily course: Measuring and
uploading finger temperature
Function selection
Starting the finger
temperature measuring system
Other Functions
Measuring finger temperature
Filling out the scales
C. System Workflow
First, users enter and login to the system website. Figure 3 is
the screenshot of the patients' homepage. In the webpage, we
remind the patient whether or not he/she has completed the
daily course. If the patient does not complete it, the instructions
in the related webpages will lead he/she to do so. If there are
some self-rating scales to be completed, it will also be
mentioned in the homepage. In this way, the patients will not
forget the routine task they need to complete on that day. If the
patients want to query their previous finger temperature results
records or self-rating scales, or view therapists' suggestion,
they can find them on the “records review” pages. On the page
“contact therapist”, the contact information for the therapists,
such as e-mails, is provided for the patients.
Uploading the data
to the database
Uploading the data to the database
As shown in Figure 4, the main functions of the system are
measuring finger temperature, filling out self-rating scales, and
uploading the data. The function of measuring temperature is
integrated into the online therapy system. The patients can just
click the “start measurement”, “pause measurement” or “end
measurement” buttons, and can then easily complete the
required tasks respectively. After the measurements, the data is
uploaded to the database automatically by the system. This
avoids any kind of confusion for the users from other various
unconnected types of programs, such as one for measuring
temperature and another for uploading data. The simplicity of
the system promotes a willingness by the patients to participate
in this system, which in turn popularizes it will the
participating patients.
On the other hands, the hospital manager may also be a
therapist, so some user functions of the hospital manager and
the therapist are the same. The main functions of therapists are
managing the patients, which entails viewing the data daily,
replying to suggestions from the patients, viewing their
periodical self-rating scales, filling out the patients' self-rating
scales, adding new cases, and so on. Besides the above
functions, the main functions of the hospital managers are
adding new therapists and managing of them.
Fig. 3. Screenshot of the homepage for patients.
The system flowchart is shown in Figure 4. As can be seen
Fig. 4. Flowchart of the system (for patients).
In this section, we introduce the sources, the designs, the
results and the discussions about the research.
A. The Real Data for Experiments
In the experimental analyses, we use the data obtained from
subjects from the department of psychiatry in a medical center
in Taiwan. In this research, we gave each patient a muscle
relaxation course, i.e., muscle relaxation music, a biofeedback
device, i.e., the emotion ring, and an account for login into the
system. The patients were asked to practice the online
treatment courses and upload the daily results every day. The
patients would also upload the scores of their emotions before
and after the courses and also the feelings during the courses to
the database. The therapists would review the data periodically
and give the patients some feedback or suggestions if
In this research, the patients were divided into an
experimental group and a control group. The patients in the
experimental group did the courses as mentioned above, i.e.,
listening to muscle relaxation music and in the meanwhile
measuring the finger temperature. On the other hand, the
patients in the control group just listened to the muscle
relaxation music without temperature measuring. The control
group was mainly used for verification in the experiments.
During this research, we collected the patients' personal
profiles and physiological data by different mechanisms.
Among them, the physiological data were extracted and
collected by the emotion rings. After data collection, we
utilized our data mining system for analyzing the data. Before
the analyses, we did the preprocessing on the collected data. At
this step, we focused on the missing data and processed
essential data cleaning and some integration on them. For
example, some data would be stored by another format or the
redundant and missing data would be deleted. Thus, the
processing time is reduced and the accuracy of experiments is
Experimental Design
In this part, we describe the data analysis method for the
collected data, i.e., the patients' profiles and the biofeedback
data. We integrate the data mining techniques with the
processional knowledge of the mental disorder to design
methods of the data mining analysis.
The proposed data mining analysis is the association analysis
of curative effect and the biofeedback data. The framework of
this analysis is shown in Figure 5. We analyze the association
between the biofeedback data extracted from the emotion rings
and the curative effects. In this analysis, the finger temperature
data is regarded as time series data. We apply the SAX
algorithm [9] to transform the numerical data to sequence data.
After data transformation, we apply sequential pattern mining
to the sequence data for finding sequential patterns. Then we
apply the CBS algorithm [19] for building classification
models on curative effects. The results could be useful
references in assisting the therapists in predicting the curative
effect by the treatment conditions.
Biofeedback data
Fig. 5. The framework of the analysis of the curative effect and biofeedback
Experimental Results
In this part, we address the experiment results of the analysis
of the curative effect and the biofeedback data. We use real
datasets as mentioned above for the analysis. Before the
analysis, we apply data preprocessing methods to prune the
missing or error data as follows. For a tuple whose temperature
differs from the previous one by more than 2к, it will be
considered as an error and then pruned. Naturally, the
temperature difference of a human will not be above 2к in
one second. This happens in the data because the battery of the
device is flat or the patients interrupt the course, such as the
emoting ring is suddenly removed from the finger.
With regards to the curative effects, we use two types of
scores for objectively and subjectively judging them. One is
self-rating scores which are determined by patients themselves,
and another is the curative effects which are determined by the
patients' therapists. We perform the following two experiments
by different conditions as follows.
Experiment A. In this experiment, we take all patients'
biofeedback data. We set the class for each tuple according to
the patients' self-rating scores. If the scores after the courses
are better than the scores before, we regard the treatment
effects as "good"; otherwise, they are considered as "bad." The
class values of the tuples in this experiment are just good or
bad. After data preprocessing, we divide the data into training
data and testing data with the ratio of 7:3. The experimental
results are shown in Table II. In Table II and Table III, the
column "inner testing" means the accuracy of the training data
and "outer testing" means the accuracy of testing data. By
Table II, we can see the overall accuracy is high, i.e., above
80%. It can be seen from this that the curative effects are
highly dependent on the biofeedback data. Furthermore, we can
also know that the biofeedback data can really reflect the
patients' mental state. The results could be important for the
therapists' diagnosis.
Inner testing
Outer testing
Precision of good
Recall of good
F-measure of good
Experiment B. In this experiment, we take all patients'
biofeedback data. We set the class to each tuple according to
the curative effect which is determined by the therapists. There
are three kinds of curative effect which is judges by therapists:
good, bad, and medium. In this experiment, we use the tuples
with the class good and medium. We also divide the data into
training data and testing data by 7:3. The experimental results
are shown in Table III. By Table III, we can observe that the
results are a little worse than Experiment A. These is because
the therapists took into account not only the patients'
biofeedback data and the self-rating scores, but also the
patients' feelings and moods during the courses. These might
cause some variants on the previous experimental results
whose curative effects are judged by using only patients'
biofeedback data.
Inner testing
Outer testing
Precision of good
Recall of good
F-measure of good
From the above experiments, we can ascertain that the
curative effects are highly dependent on the biofeedback data,
i.e., the curves of finger temperature, for the patients of panic
disorder. By using this system, we can better control the
patients' status when they are performing the biofeedback
therapies. In other words, we can know not only the patients'
physical state but also their mental state when they are
participating in the courses.
In this paper, we have proposed a web-based online
therapeutic system for mental disorders. The contributions of
our system are as follows. First, the patients can get the
information or services which are provided by the system.
Second, the patients can measure and upload their
physiological status via the system. Third, the therapists and
the hospital managers can manage their patients conveniently
via the system. By the experimental results, we can know that
the biofeedback data is useful for judging the curative effects
of the patients with panic disorders. For the future work, we
will apply the system to the mobile platforms such as mobile
phones and PDAs so that the users may use this system more
conveniently and ubiquitously.
This research was supported by the Applied Information
Services Development & Integration project, Phase II of
Institute for Information Industry and sponsored by MOEA,
P. Carlbring, B. E. Westling, P. Ljungstrand, L. Ekselius, G. Andersson.
“Treatment of Panic Disorder via the Internet: A Randomized Trial of a
Self-Help Program.” Behavior Therapy 2001; 32(4): 751-764.
P. Carlbring, L. Ekselius, G. Andersson. “Treatment of panic disorder via
the Internet: a randomized trial of CBT vs. applied relaxation.” Journal of
Behavior Therapy and Experimental Psychiatry 2003; 34: 129–140.
P. Carlbring, E. Nilsson-Ihrfelt, J. Waara, C. Kollenstam, M. Buhrman, V.
Kaldo, M. SÖderberg, L. Ekselius, G. Andersson. “Treatment of panic
disorder: live therapy vs. self-help via the Internet.” Behaviour Research
and Therapy 2005; 43(10): 1321–1333.
N. K. Cobb, A. L. Graham, B. C. Bock, et al, “Initial evaluation of a
real-world Internet smoking cessation system.” Nicotine Tob. Res. 2005;
7: 207-216.
J. A. Cunningham, K. Humphreys, A. Koski-Jannes, J. Cordingley,
“Internet and paper self-help materials for problem drinking: is there an
additive effect?” Addict Behav. 2005; 30: 1517-1523.
J. F. Etter, “Comparing the efficacy of two Internet-based,
computer-tailored smoking cessation programs: a randomized trial.” J.
Med. Internet Res. 2005; 7: e2.
R. C. Kessler, W. T. Chiu, R. Jin, A. M. Ruscio, K. Shear, E. E. Walters.
“The epidemiology of panic attacks, panic disorder, and agoraphobia in
the National Comorbidity Survey Replication.” Archives of General
Psychiatry 2006; 63(4): 415-424.
L. A. Kiropoulos, B. Klein, D. W. Austin, K. Gilson, C. Pier, J. Mitchell,
L. Ciechomski. “Is Internet-based CBT for panic disorder and
agoraphobia as effective as face-to-face CBT?” Journal of Anxiety
Disorders. 2008; 22(8): 1273-1284.
J. Lin, E. Keogh, S. Lonardi, B. Chiu. “A symbolic representation of time
series, with implications for streaming algorithms.” 8th ACM SIGMOD
Workshop on Research Issues in Data Mining and Knowledge Discovery,
J. S. Markowitz, M. M. Weissman, R. Ouellette, J. D. Lish, G. L.
Klerman. “Quality of Life in Panic Disorder.” Archives of General
Psychiatry 1989; 46(11): 984-992.
I. M. Marks, S. C. Shaw, R. Parkin, “Computer-aided treatments of
mental health problems.” Clinical Psychology: Science and Practice 1998;
5: 151-170.
I. M. Marks, K. Cavanagh, L. Gega. “Computer-Aided Psychotherapy:
Revolution or Bubble?” The British Journal of Psychiatry 2007; 191(6):
471 - 473.
M. V. Mendlowicz, M. B. Stein. “Quality of Life in Individuals with
Anxiety Disorders.” The American Journal of Psychiatry 2000; 157(5):
M. J. Moore, J. Soderquis, C. Werch, “Feasibility and efficacy of a binge
drinking prevention intervention for college students delivered via the
Internet versus postal mail.” J. Am. Coll. Health 2005; 54: 38-44.
C. B. Pull. “Self-help Internet interventions for mental disorders.”
Current Opinion in Psychiatry 2006; 19: 50–53.
M. A. Reger, G. A. Gahm. “A meta-analysis of the effects of Internetand computer-based cognitive-behavioral treatments for anxiety.” Journal
of Clinical Psychology 2009; 65(1): 53-75.
P. P. Roy-Byrne, M. G. Craske, M. B. Stein. “Panic disorder.” The
Lancet 2006; 368(9540): 1023-1032.
J. M. Somers, E. M. Goldner, P. Waraich, L. Hsu. “Prevalence and
Incidence Studies of Anxiety Disorders: A Systematic Review of the
Literature.” Can J Psychiatry 2006; 51: 100–113.
V. S. Tseng and C. H. Lee. “CBS: A New Classification Method by using
Sequential Patterns.” In Proc. SIAM Int’l Conf. on Data Mining, USA,
April, 2005.
M. M. Weissman, G. L. Klerman, J. S. Markowitz, R. Ouellette.
“Suicidal ideation and suicide attempts in panic disorder and attacks.”
The New England of Journal of Medicine 1989; 321(18): 1209-1214.
J. L. Wetherell, E. J. Lenze, M. A. Stanley. “Evidence-based treatment of
geriatric anxiety disorders.” Psychiatric Clinics of North America 2005;
28(4): 871-896.
Adaptive SmartMote in Wireless Ad-Hoc Sensor Network
Sheng-Tzong Cheng1, Yao-Dong Zou1, Ju-Hsien Chou1, Jiashing Shih1, Mingzoo Wu2
Department of Computer Science and Information Engineering, National Cheng Kung University,
Tainan, Taiwan
Innovative DigiTech-Enabled Applications & Services Institute, Institute for Information Industry,
Kaohsiung, Taiwan
Sensor nodes may need to be reprogrammed, e.g.
update the running program. An additional module may
have to be added to the program, or a complete protocol
implementation exchanged. Another important reason is
design-implement-test iterations during the development
cycle. It is highly impractical to physically reach all nodes
in a network and manually reprogram them by attaching the
node to a laptop or PDA, especially for a large number of
distributed sensors. It may also be simply infeasible in
various scenarios, if the nodes are located in areas that are
A wireless updating scheme is required to set all nodes
up to date with the new version of the application. Another
consideration is the amount of code transferred. While it is
normal to send the whole code if the application needs to
be replaced, it does not make much sense in other cases. If
we just add or exchange a part of the code, we transmit
code that is already available on the node, maybe just
shifted from its original location in program memory by a
certain offset. Also if a bug has been identified and fixed in
the test process, the biggest part of the code remains
exactly the same, probably only differing for some
functions or constants. To reduce this redundancy, it is
much more efficient in terms of used bandwidth and time to
only send the changes in the code, and leave the
recombination of the new code to the node itself.
Abstract—This paper describes an update mechanism for
large wireless ad-hoc sensor networks (WASNs). In
wireless sensor networks, the nodes may have to be
reprogrammed, especially for design-implement-test
iterations. Manually reprogramming is a very cumbersome
work, and may be infeasible if nodes of the network are
unreachable. Therefore, a wireless update mechanism is
needed. Exchanging the running application on a node by
transmitting the complete program image is not efficient for
small changes in the code. It consumes a lot of bandwidth
and time. The proposed framework, Adaptive SmartMote,
defines and supports control JOBs that allow computation,
behaviors. The goal of this paper is to use programmable
packet to update sensor behaviors. To reduce the code
transferred and power consumption, we propose a group
management architecture. This architecture helps reduce
power consumption and increase node number that control
by Leader Node in WASNs. The proposed update protocol
has been implemented on the Tmote-based Octopus II
sensor node, which is named SmartMote, which runs
TinyOS [1], a component-based operating system for
highly constraint embedded platform.
1. Introduction
In our daily life, we encounter sensors of all different
kinds without even taking notice of. Motion sensors turn on
lights when we walk by, the heating or air conditioning of
rooms is controlled by temperature sensors and fire
detectors alert us in case of emergency.
Recently, a lot of attention has been directed toward
extended, “Active” or “intelligent” sensors, that can not
only conduct certain measurements, but are equipped with
computational power and over-the-air communication. A
lot of additional application areas have appeared for these
new devices, ranging from medical applications, home
automation, traffic control and monitoring of eco-systems
to security and surveillance applications.
Researchers have mostly been concerned with
exploring applications scenarios, investigating new routing
and access control protocols, proposing new energy-saving
algorithmic techniques, and developing hardware
prototypes of sensor nodes.
2. Related Work
We overview existing approaches vary from
single-hop reprogramming over multi-hop reprogramming
to complete virtual machine [6] approaches.
2.1 XNP
One of the very first approaches used to reprogram
sensor nodes was included in TinyOS. With XNP [2][3][7],
mica2 and mica2dot nodes can be reprogrammed over the
air. Only complete images can be transferred to the node,
since XNP does not consider identical code parts. There is
no forwarding mechanism in the program, so only the
nodes in the immediate neighborhood of the basis station
can be reprogrammed. This is also called single-hop
permit the lower level binary code to be modified.
3. System Architecture
3.1 Network Topology
There are three types of node in a WASN: Leader
Node, Function Node, and Sensor Node. They cooperate
with each other to deal with necessary data, in order to
achieve the goal of distributed computation and power
Fig. 1 shows adaptive SmartMote packet transmission
in the network. User uses instructions defined by the
system to set nodes behavior. The instructions will be
compiled to byte codes by computer. It will encapsulate the
byte codes into network packet and transmitted the packet
to Leader Node. After Leader Node receiving and parsing
the packet by SmartMote, it will distribute the packet to the
Function Nodes in the network or execute the code itself.
As the instructions we propose, they describe some
behaviors affecting target nodes.
The packet described above is included instructions.
When nodes send/receive packets, they will enable nodes to
operate new behaviors. For an instance, if a packet
describes the instruction of computing, nodes will compute
sensing data base on the instruction after SmartMote
parsing the packet. The data after dealing by the node will
be passed to PC. Therefore, the distributed architecture
enables data to distributed computation or update node’s
behavior. There are two issues supervening: 1) how to
manage group nodes, and 2) how about the architecture of
wireless sensor network.
Fig. 1: Adaptive SmartMote packet transmission
2.2 Multi-Hop Over-The-Air-Programming
Multi-hop Over-The-Air-Programming (MOAP) [3]
uses basic commands for an edit script, but adds some
special copy commands. The script is computed separately
for both the code and the data part of the object file, and
merged afterwards. Some copy commands can be
optimized that way. For dissemination, an algorithm called
Ripple is used, that distributes the code packets to a
selective number of nodes, not flooding the network.
Corrupted or missing packets are retransmitted using a
sliding window protocol, which allows the node to process
or forward received packets while waiting for the
retransmission of the missing packet.
2.3 Trickle
Trickle [4] is the epidemic algorithm used by Deluge
for propagating and maintaining code updates in wireless
sensor networks. A “polite gossip” policy is applied, where
nodes periodically broadcast a code summary to the local
neighbors, but stay quiet if they have recently heard a
summary identical to theirs. A node that hears an older
summary than its own broadcasts an update. Instead of
flooding the network with packets, the algorithm controls
the send rate so each node hears a small trickle of packets,
just enough to stay up to date. An implementation of
Trickle is contained in TinyOS 2.x.
a) Node Classification and Description
According to node hardware, capability, and
electricity, two kinds of node are specified: Super Node and
Sensor Node. Super Node provides data computing,
coordination, and communicating. However, Sensor Node
just collects necessary data and transmits it to Super Node
or react the behavior that Super Node assigns it. Super
Node is different from Sensor Node not only hardware
specification but also inner component structure. Super
Node enables real-time updating all code storage and
altering it behavior. Otherwise, Sensor Node has a few
algorithms hard-coded into each node but tunable through
the transmission of parameters. On the other hand, Super
Node can carry on Leader Node election. Leader Node is a
cluster’s head and the others are Function Nodes.
2.4 SensorWare
In SensorWare [5], the developers set very high
requirements on the hardware. It does not fit into the
memory of popular sensor nodes and targets richer
platforms to be developed in the future. In contrast to Maté,
also complex semantics can be expressed. The program
services are grouped into theme related APIs with
Tcl-based scripts as the glue. Scripts located at various
nodes use these services and collaborate with each other to
orchestrate the data flow to assemble custom networking
and signal processing behavior. Application evolution is
facilitated through editing scripts and injecting them into
the network. Both SensorWare and Maté can update
application by replacing high-level scripts. They cannot
b) Leader Node Election and Inheritance
Each Super Node begins the status of the competition.
After competition, one Leader Node and several Function
Nodes will be identified. And then there is a table that
records result of election in each node. If present Leader
Node is destroyed or come to power of threshold limit
value, backup scheme will be started to inherit to the leader.
Hence, we are able to generate a new Leader Node
efficiently and increase performance effectively.
The condition of inherit to leader is depended on the
threshold limit value of power. When power of the present
Leader Node come to threshold limit value, present Leader
Node starts handing over its job to the next Function Node
which ranks the first in the inheritance table. Finally, the
new Leader Node broadcast update message to all nodes.
commands using the standard method that NesC provides
for that purpose.
As SmartMote’s design progressed over time, the set
of commands changed considerably. We start with some
basic commands and APIs for object mobility along with
some commands for timer, network, and sensing
abstraction, and kept adding commands as necessary.
SmartMote declares, defines, and supports the creation of
virtual devices. All abstraction services are represented as
virtual devices. There is a fixed interface for all devices.
An intuitive description of a sensor node task (a part
of a distributed application) has the form of a state machine
that is influenced by external events. This is also the form
of SmartMote JOBs. The programming model is as
following: An event is described, and it is tied with the
definition of an event handler. The event handler, according
to the current state, will do some processing and possibly
create some new events or/and alter the current state. For
example, there is waiting for event a or b, c. if a device can
produce events, a task is needed to accept event with
waiting state that is waiting on the device’s events.
Although the JOBs are defining behavior at the node level,
SmartMote is not a node-level programming language. It
can be better viewed as an event-based language since the
behaviors are not tied to specific nodes but rather to
possible events that depend on the physical phenomena and
the WASN state.
Fig. 2: Sensor framework
3.2 Sensor Framework
Fig. 2 shows SmartMote place inside a layered sensor
node’s framework. The lower layers are the raw hardware
and the hardware abstraction layer (i.e., the device drivers).
TinyOS exists on top of the low layers, which provides all
the basic services and components of limited available
resources that are needed by the layers above it. The
SmartMote layer uses those functions and services offered
by TinyOS to provide the run-time environment for the
control JOBs. The layer for instance, includes event handler
for events to register. The control JOBs rely completely on
the SmartMote layer while populating through the network.
Control JOBs use the native services that SmartMote
provides as well as services provided by other JOBs to
construct applications. Two things comprise SmartMote: 1)
the language and 2) the supporting run-time environment.
b) The run-time environment
Fig. 3 depicts abstracted view of SmartMote’s
run-time environment. Most of the threads running are
coupled with a generic queue. Each thread “pends” on its
corresponding queue, until it receives a message in the
queue. When a message arrives it is promptly processed.
The next message will be fetched, or if the queue is empty,
the thread “pends” again on the queue. A queue associated
with a JOB thread is receiving events (i.e., reception of
network messages, sensing data, or expiration of timers). A
queue associated with one of the three resource handling
tasks, receives events of one type (from the specific device
driver that is connected to), as well as messages that
declare interest in this event type. For instance, the Sensing
resource-handling task is receiving sensing data from the
device driver and interests on sensing data from the JOBs.
The JOB Manager queue receives messages from the
network that wish to spawn a new JOB. There are also
system messages that are exchanged between the system
threads (like the ones that provide the Admission Control
thread with resource metering information, or the ones that
control the device drivers).
a) The language and programming model
First, a language needs commands to act as the basic
building blocks of the JOBs. These commands are
essentially the interface to the abstraction services offered
by SmartMote. Simple examples include: timer services,
acquisition of sensing data, location discovery protocol.
Second, a language needs constructs in order to tie these
building blocks together in control JOBs. Some examples
include: constructs for flow control, such as loops and
conditional statements, constructs for variable handling and
constructs for expressing evaluation. We call all these
constructs the “net core” of the language, as they combine
several of the basic building blocks to make actual control
NesC [8], offering great modularity and portability, is
considered as a suitable language for SmartMote. We
choose the NesC core to be the net core in the SmartMote
language. All the basic commands are defined as new NesC
Code Transmitting and Updating
Fig. 4 shows the flow chart for a user using
instruction to update sensor behavior. The instruction is
translated to byte code and distributed to node in the
network. When we want to update Function Node, Leader
Node will receive the byte code with updating instruction.
Leader Node route the byte code to the Function Node
according to its routing table. After the Function Node
receiving the byte code, it parses the code by SmartMote
and updates its behavior.
TinyOS. Instead of installing applications as binary objects
on the sensor node, every node executes a byte code
interpreter. SmartMote reads the special byte code
commands from memory, and transforms these operations
to TinyOS. Therefore reinstallation and rebooting are not
required if the program is just some input data for this
system. The flash memory size is 1024 KB. TinyOS, Code
Store, and Data Store are allocated 128 KB. SmartMote,
Register, and Temp Store are allocated 64 KB.
d) Programmable Packet Format
The described above is about how to generate a
programmable packet and how to communicate and update
in the WASN. We design a format of programmable packet.
The programmable packet with executable program is
Fig. 6: An example of SmartMote instruction execution
4.1 SmartMote Architecture
In order to achieve the goal of real-time updating or
computing, the loader is triggered to access the new code or
parameters from flash memory. Then the Loader Node
loads the byte code into SmartMote. Finally, SmartMote
affects their behavior after parsing and executing the byte
Fig. 3: Abstracted view of SmartMote’s run-time environment
4.2 SmartMote Instructions
The instructions are used by user to affect sensor
behavior. There are four types of instructions: computing
instruction, control instruction, system instruction, and
network instruction. Fig. 6 shows an example of a
SmartMote instruction execution. We compile instruction
into byte code and write the 4-bits byte code to register.
SmartMote parses and executes the byte code in order to
affect behavior of a node.
Fig. 4: User uses instruction to update sensor behavior.
5. Performance Analysis
In this chapter we present experiments as well as
simulations on the performance of SmartMote. The
experiments and measurements are conducted on a
hardware platform, Octopus II [7]. On the other hand, for
the simulations, we choose TOSSIM as the simulation
Fig. 5: Programmable packet format
shown as Fig. 5, in which the format of programmable
packet. Restricted to TinyOS, only 29 bytes of packet
length can be used. We set GID, STA, SRC, and DES to
basic head. However, TYPE, LEN, and DATA/CODE
describe about the information of packet.
5.1 Experimental Results and Analysis
The test bed is set to a 30 x 30m free space on our
campus. We set four cases: 4, 8, 12, and 16 nodes in the
free space. After sensor nodes update their behaviors, they
send their sensing data to the leader node. Fig 7 shows the
traditional cases with flooding method. When there are 16
4. SmartMote
SmartMote is a compact interpreter-like virtual
machine [6] designed specifically for WASNs built on
nodes within 60 seconds of operation of the network, the
calculated value is 3840. Form the results, we find the
value of measurement is 3222 and its loss rate is 16%.
sensor network. We model the power consumption
of the reprogramming process with
PTotal PRadio PFlashAccess PComputing PSensorStartup
is the power spent in transferring and
FlashAccess the
receiving the JOB over the network,
power cost of reading and writing the JOB in flash ROM,
the power consumed by using instructions to
compute data, and
waking up sensor node.
Fig 7&8: Traditional case with flooding method & Smart
scheme (packet rx).
Fig 8 shows the result that uses SmartMote scheme.
For each case that is considered in Fig 7, one function is
generated for every four sensor nodes. For example, for the
case of 16 nodes in Fig 7, four functions nodes are assigned.
Therefore, for the traditional cases of 4, 8, 12, and 16 nodes
(in Fig 7), we have 1, 2, 3, and 4 function nodes
respectively (in Fig 8). In the SmartMote scheme, a leader
node receives the result of computation from function
nodes. However, each of function nodes manages 4 sensor
nodes in the network. By the scheme, we are able to reduce
packet collision and decrease the number of transmitted
packets in the network. When receiving packets,
SmartMote scheme is stable than flooding scheme.
PRadio PFlashAccess
( BD BP )( PTx PRx )
B D PInstruction
, and
the required power for
B D ( Pr Pw )
TABLE 1 presents parameters of power consumption.
Based on the structure and power consumption of each
component, the value of
can be written as
PTotal ( BD BP )( PTx PRx ) BD ( Pr Pw ) BD PInstruction BD BP BD ,
PTotal BD ( PTx PRx Pr Pw PInstruction ) when
If there are k packets at most in the experiment, we can set
TABLE 1: Power consumption parameters.
the value of
PTotal BD (i ) ( PTx PRx Pr Pw PInstruction ) i 1
Fig 9: Traditional case with flooding method (power
Evaluation of Power Consumption
In our system, row data can be computed by
SmartMote in sensor nodes. Processed data can be
collected by function nodes and then transmitted to leader
node. By the procedure, it decreases the amount of
transmission and it also reduces the power consumption.
The following metric describes the power consumption
of the transmitter and receiver node when updating Jobs
and used this data to evaluate the physical layers of a
In the formula above, we adopt the results from Fig 7
and 8 to evaluate power consumption in the WASN. For
the example, we assume a 30x30m free space, =1 [9],
and =8.103mA. Fig 9 and 10 show the results of power
consumption. In the same environment condition, we find
that SmartMote scheme can reduce up to 72% power
to achieve the goals of power consumption and behavior
update. SmartMote system makes WASN platforms open to
transient users with dynamic needs. This fact, apart from
giving an important flexibility advantage to deployed
systems, greatly facilitates researchers to cross the
algorithms/protocols in real platforms.
Fig 10: SmartMote scheme (power consumption).
Assuming the leader node transmits a job to 100 sensor
nodes for updating their behavior. Sensor nodes base on the
job to react to the behavior and send sensing data to leader
node through function nodes.
Fig 11 shows the distribution of completion time for
individual nodes. All nodes have a completion time bigger
than 1 second, but less than 15 seconds. We note that the
average range of completion time for an individual is 0.15̈́
0.05 second. Fig 12 presents the final results revealing that
the SmartMote scheme is considerably faster than the
traditional cases with flooding scheme. Furthermore, the
bulk of the delay in two schemes shows that flooding
scheme spent more time in the communication part and
computation part than SmartMote scheme. In flooding
scheme, because its packet lost rate is higher than
SmartMote scheme, more retransmission is required.
Moreover, flooding scheme needs centralized computation
in the leader node, so it also spends much time in
computation part.
Fig 12: Total delay breakdown for two schemes.
This study is conducted under the “Applied Information
Services Development & Integration project” of the
Institute for Information Industry which is subsidized by
the Ministry of Economy Affairs of the Republic of China.
6.1 References
[1] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, K. Pister,
“System Architecture Directions for Networked Sensors,”
ASPLOS-IX proceedings, Cambridge, MA, USA, Nov. 2000
[2] Crossbow Technology, “Mote in Network Programming
[3] T. Stathopoulos, J. Heidemann, and D. Estrin, “A Remote
Code Update Mechanism for Wireless Sensor Networks,”
CENS Technical Report No. 30, Nov. 2003
[4] P. Levis, N. Patel, D. Culler, and S. Shenker, “Trickle: A
Self-Regulating Algorithm for Code Propagation and
Maintenance in Wireless Sensor Networks,” NSDI’04, pp.
15-28, 2004
[5] A. Boulis, C.-C. Han, and M.B. Srivastava, “Design and
Implementation of a Framework for Efficient and
Programmable Sensor Networks,” MobiSys’03 Proceedings,
pp. 187-200, New York, NY, USA, 2003
[6] P. Levis and D. Culler, “Mate: A Tiny Virtual Machine for
Sensor Networks,” ASPLOS X Proceedings, 2002
[7] Moteiv, “Tmote Sky Data Sheet,” 2006
[8] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, D.
uller, “The nesC Language: A Holistic Approach to
Networked Embedded Systems,” ACM PLDI’03, San Diego,
CA, USA, Jun. 2003
[9] J. A. a. J. Rabaey, "The Energy-per-Useful-Bit Metric for
Evaluating and Optimizing Sensor Network Physical
Fig 11: The distribution of completion times for individual nodes.
6. Conclusions
In this paper, an application to update WASNs with
programmable packet and SmartMote is designed and
implemented for TinyOS on the SmartMote platform. We
present our framework for dynamic and efficient WASN
programmability. Through our implementation we are able
A RSSI-based Algorithm for Indoor Localization
Using ZigBee in Wireless Sensor Network
Yu-Tso Chen1, Chi-Lu Yang1,2, Yeim-Kuan Chang1, Chih-Ping Chu1
Department of Computer Science and Information Engineering, National Cheng Kung University
Innovative DigiTech-Enabled Applications & Service Institute, Institute for Information Industry
Tainan, Taiwan R.O.C.
Kaohsiung, Taiwan R.O.C.
{ p7696147, p7896114, ykchang, chucp}
Keywords: indoor localization, home automation,
ZigBee modules, wireless sensor networks
environment. The RSSI value can be regularly measured
and monitored to calculate distance between objects.
Time of arrival (TOA) means the travel time of a radio
signal from one single sender to another remote receiver.
By computing the signal transmission time between a
sender and a receiver, the distance could approximately
be estimated. Time difference of arrival (TDOA) is
computed based on the emitted signals from three or
more synchronized senders. It also refers to a solution of
locating a mobile object by measuring the TDOA.
In this paper, we inquired about the RSSI solutions on
indoor localization, and proposed a new RSSI-based
algorithm and implemented it using ZigBee CC2431
modules in wireless sensor network. The rest of this
paper is organized as follows. In Section 2, we briefly
introduce the related work on indoor localization in
WSN. In Section 3, we first define relevant arguments to
describe our algorithm. We then carefully explain the
proposed algorithm. In Section 4, the experimental
results are analyzed and discussed to validate our
algorithm. We show our algorithm is more accurate by
comparing with the others methods. The conclusion and
future work of the study are summarized in Section 5.
1. Introduction
2. Related Work
For a large number of applications in home
automation, the service system requires to precisely
sensing user’s locations by certain sensors. Moreover, the
system sometimes requires recognizing the time and the
weather for making decisions. On the other hand, the
users always hope to be served correctly and suitably by
the service system in the house. For satisfying the users’
demands, one of the most key successful factors is to
accurately estimate the user’s location. It is considered as
a challenge to automatically serve a mobile user in the
Indoor localization cannot be carried out effectively by
the well-know Global Positioning System (GPS), which
is subject to be blockaded in the urban and indoor
environments [1-4]. Thus in recent years, Wireless
Sensor Networks (WSNs) are popularly used to locate
mobile object in the indoor environment. Some physical
features are widely discussed to solve indoor localization
in WSN. Received signal strength indication (RSSI) is
the power strength of radio frequency in a wireless
ZigBee solutions are widely applied in many areas,
such as home automation, healthcare and smart energy
(ZigBee Alliance). ZigBee is a low-cost, low-power, low
data rate and wireless mesh networking standard
originally based on the IEEE 802.15.4-2003 standard for
wireless personal area networks (WPANs). The original
IEEE 802.15.4-2003 standard has been superseded by the
publication of IEEE 802.15.4-2006 for extending its
features [5, 14]. While many techniques related to
ZigBee have also been applied to indoor localization, we
choose to focus on 2-dimension localization issues for
the following introduction.
For the various applications in home automation, the
service system requires to precisely estimate user’s
locations by certain sensors. It is considered as a
challenge to automatically serve a mobile user in the
house. However, indoor localization cannot be carried
out effectively by the well-know Global Positioning
System (GPS). In recent years, Wireless Sensor
Networks (WSNs) are thus popularly used to locate a
mobile object in an indoor environment. Some physical
features are widely discussed to solve indoor localization
in WSN. In this paper, we inquired about the RSSI
solutions on indoor localization, and proposed a Closer
Tracking Algorithm (CTA) to locate a mobile user in the
house. The proposed CTA was implemented by using
ZigBee CC2431 modules. The experimental results show
that the proposed CTA can accurately determine the
position with error distance less than 1 meter. At the
same time, the proposed CTA has at least 85% precision
when the distance is less than one meter.
2.1 Fingerprinting
The Fingerprinting (FPT) systems are built by
analyzing the RSSI features. The RSSI features are
pre-stored in a database and are approximately retrieved
to locate a user’s position [8-11]. The key step of FPT is
that the blind node is put at pre-defined anchor positions
in advance. By RSSI, the blind node continuously sends
ª 2( x2 − x1 )
« 2( x − x )
A=« 3 1
«¬ 2( xN − x1 )
requests to its surrounding reference nodes and receives
responses from these reference nodes. The FPT system
can then continuously record these responses to analyze
its features until the analyzed results are
characteristically stable. In general, different anchors
should be distinct from different RSSI features. In FPT,
the mobile object is approximately located by comparing
the current RSSI with the pre-stored RSSI features.
Denote a series offline training measurement of
reference node k at location Lij is L =[ lijk 0 ,...,lijkM −1 ] which
enables to compute the histogram h of RSSI.
hijk (ζ ) =
M −1
¦ δ (l
− ζ ), −255 ≤ ζ ≤ 0
3.1 Definitions
A blind node refers to a mobile object. A reference
node is a fixed node that responds its RSSI to assist
locating the blind node. In this study, both the blind node
and the reference node are ZigBee modules. In order to
describe our proposed algorithm, the following terms are
principally defined. These terms are categorized into
primitive terms, original physical terms and derived
terms. The primitive terms are defined as follows:
The reference nodes are indexed with k. The parameter
į represents the Kronecker delta function [8, 11].
Nneighbor = the number of reference nodes which close
to blind node within one hop currently
2.2 Real-Time Tracking
BID = a pre-defined identification of a blind node,
which is a mobile object.
The method, which can locate a mobile object by at
least three reference nodes without pre-trained database,
is named Real-Time Tracking (RTT) [1-4, 6-7]. The RTT
System can convert the RSSI to a distance by specific
formulas. Trilateration is a method to determine the
position of an object based on simultaneous range
measurements from at least three reference nodes at
known location [1]. Trilateration requires the coordinates
of at least three reference nodes (Xi, Yi) and the distances
d ip between the blind node and the pre-positioned
reference nodes. The target’s position P(Xp, Yp) can be
obtained by MMSE [3]. The difference between actual
and estimated distance is defined by formula (2) where i
is a reference position and p is a mobile object.
RID = a pre-defined identification of a reference node
(a fixed object), where 1 ” RID ” Nneighbor
Rthreshold [RID][d] = the RSSI of RID within the
pre-defined threshold at distance d, where
distance d is a set = {d(m) | 0.5, 1, 1.5, 2.0, 2.5,
MACA = the mode of approximately closer approach for
Tracking (the improved algorithm)
MRTT = the mode of Real-Time Tracking
The values of RSSI thresholds of RID within distance
d are pre-trained and stored in the database. The terms of
physical arguments, which are originally received from
ZigBee blind node, are defined as follows:
Rnow(x) = the current value of the measured RSSI of x,
where variable x refers to RID
Eq. (2) can be transformed into
(d Pi ) 2 = ( xi − x p )2 + ( yi − y p ) 2
rid = an index of Rnow , where rid < Nneighbor
The derived terms, which values are calculated from
the physical terms and primitive terms, are defined as
Then Eq. (3) is able to be transformed into
ª ( d1 ) 2 − ( d 2 ) 2 + ( x 2 + y 2 − x 2 − y 2 ) º
1 » 2( x − x )
« P
ª 2 1
« 1 2
3 2
2 »
« ( d P ) − ( d P ) + ( x3 + y3 − x1 − y1 ) » = «« 2( x3 − x1 )
» «...
» « 2( x − x )
N 1
« 1 2
N )2 + ( x 2 + y 2 − x 2 − y 2 ) » ¬
¬« P
1 ¼»
2( y2 − y1 ) º
2( y3 − y1 ) » ª x p º
»« »
»« y p »
2( y N − y1 ) »¼ ¬ ¼
CloserList[x] = a list RID of sorted by Rnow(x), where
Rnow(x) within Rthreshold[x][d] and Rnow(x) ”
Rnow(x-1), 1 ” x ” Nneighbor
SortedList[x] = a list RID of sorted by Rnow(x), where
Rnow (SortedList[x]) ” Rnow (SortedList[x-1])
Therefore, Eq. (4) is transformed into Eq. (5), which
can be solved using the matrix solution given by Eq. (6).
Position P(Xp, Yp) can be obtained by calculating Eq. (6).
ªxp º
b = A« »
¬« y p ¼»
ªxp º
« » = ( A A) *( A b)
¬« y p ¼»
ClosestRID = a rid refers to RID, which is the closest
node near the blind node (the mobile object;
BID), and where Rnow (ClosestRID) is within
CR = a record for tracking the mobile object
ª 1 2
2 2
2 º
« ( d P ) − ( d P ) + ( x2 + y2 − x1 − y1 ) »
« ( d1 ) 2 − ( d 3 ) 2 + ( x 2 + y 2 − x 2 − y 2 ) »
1 »
b = «
«( d ) − ( d ) + ( x + y − x − y ) »
1 ¼
¬ P
3. Proposed Algorithm
m =0
d Pi = ( xi − x p ) 2 + ( yi − y p ) 2
2( y2 − y1 ) º
2( y3 − y1 ) »
2( y N − y1 ) »
MC = Current localization mode = {MC | MACA, MRTT}
3.2 Closer Tracking Algorithm
The locating style of the FPT has its own specific
advantage and disadvantages, while the RTT style also
has its own. The features of the two styles are
characteristically complementary. Therefore, we
proposed a compound algorithm to determine the usable
mode at suitable time. Furthermore, we improved the
FPT algorithm at the same time. This idea is also
emerged from our observation on elder persons in the
house. The elders usually stay on the same positions,
such as sofa, table, water cooler or bed. They even
frequently stay in front of the television or near the door
for a long time. The time they are moving is much less
than they are staying, while they are in their house. Since
we look forward to provide automatic applications
suitably for elders in their house, we can ideally design a
position tracking algorithm based on above observation.
The proposed algorithm for closer tracking (CTA) was
specifically designed to improve the automatic
applications. The CTA is carried out by the following
four steps.
will be selected since the Ref4 is the first RID in the
CloserList. The other RIDs in the CloserList will be
iteratively selected to narrow down the range. The
iteration will be stopped until the CloserList is empty.
The pseudo codes of the CTA, which contains the ACA,
are showed in Table 1.
Step1 – [Build Neighbor List]
The blind node BID (the mobile object) periodically
receives RSSI (Rnow) from its neighbor nodes (RIDs) by
broadcasting its requests. The neighbor nodes will be
recorded by comparing their RSSIs with the pre-defined
thresholds (Rthreshold). In other words, if the RSSI of the
RID is within the Rthreshold at distance d, the RID will be
stored into the CloserList.
Fig. 1 Concept and flow of the Proposed Algorithm
Table 1 the pseudo codes of the CTA
Algorithm_Closer_Tracking(int *Rnow)
short CloseList [8]={-1};
int k=0;
const int row=3;
const int col=2;
//////Step1 – Build Neighbor List////////////////////////////////////////////////
01 for (dis = 0.5 ; dis <= 2.0 ; dis += 0.5){
for (rid = 1 ; rid <= Nneighbor ; rid++){
if (Rnow[rid] within Rthreshold[rid][dis]){
CloseList[k] = rid;
}//end if
}//end for
08 } //end for loop
//////Step2 – Determine Mode////////////////////////////////////////////////////
09 if (k == 0) { // No record in the CloserList
break; //Change to Real-Time Tracking Mode
12 } //end if
//////Step3 – Adapt Assistant Position/////////////////////////////////////////
//////Only ClosestRID in the CloserList//////////////////////////////////////
13 if (k == 1) {
for (int x = 1; x < Nneighbor ; x++){
CloseList[x] = SortedList[x-1];
} //end for
k = Nneighbor;
18 } // end if
//////Step4 –Approximately Closer Approach///////////////////////////////
19 ClosestRID = CloseList[0];
20 for (int s = 0 ; s < k ; s++){ //FPT
switch (CloseList[s+1] - ClosestRID){
case 1:
CR[s] = R2; break;
case -1:
CR[s] = R4; break;
case col:
CR[s] = R3; break;
case -col:
CR[s] = R1; break;
default: //other 4 direction
} //end switch
32 } //end for
33 MC = MACA
} //end Closer Tracking Algorithm
Step2 – [Determine Mode]
If there are records stored in CloserList, the improved
FPT will be executed to locate the mobile object. In other
words, if there is no record in the CloserList, the RTT
will be executed for locating the mobile object.
Step3 – [Adapt Assistant Position]
It’s likely that there is only one record in the
CloserList. If the special situation occurs, we should
need an extra data structure - SortedList. The SortedList
is an array used to store the ordering RIDs, which are
sorted by the received RSSIs. Nevertheless, the closest
RID (ClosestRID) should not be stored into the
SortedList. In next step, the CloserList and SortedList
will be used to locate the mobile object more precisely
under MACA mode.
Step4 – [Approximately Closer Approach]
The improved FPT, which is named approximately
closer approach (ACA), is divided into two phases. In the
first phase, ClosestRID is used to figure out a circular
range, since the RSSI of the ClosestRID is within the
pre-defined threshold at distance d. The plane of
ClosestRID range can be conceptually divided into four
sub-planes. In the second phase, the RIDs in the
CloserList will be iteratively retrieved to select the
sub-planes for narrowing down the outer range. For
example, let’s assume the CloserList = {Ref4, Ref1, Ref5}
and ClosestRID = Ref3. In Fig. 1, a virtual circle
surrounding the node Ref3 will be first figured out, since
the ClosestRID refers to Ref3. The plane of Ref3 range
can be conceptually divided into four sub-planes, such as
R1, R2, R3 and R4. In the second round, the sub-plane R2
4. Implementation and Experiment
The ZigBee modules are used in this experiment. The
CC2431 chip stands for the blind node and the CC2430
chips stand for the reference nodes. The specific features
of these chips are listed in Table 2, and the figure of
CC2431 is showed in Fig. 2. The RSSI values are
long-term measured in the experiment, and all the values
are stored in a database for further analysis. The
proposed CTA is programmed by using the C#.NET
Fig. 3 RSSI thresholds
4.1 Findings
We measured 1-D RSSI in different environments,
which electromagnetic waves are isolated, absorbed or
normal. In Fig. 3, the x-axis represents the various
distances between a blind node and a reference node,
such as 0.5, 1, 1.5, 2.0, 2.5 and 3 meters. The y-axis
represents the measured RSSI values. The RSSI values
are measured until the statistic results are stable. In order
to observe the data, all the measured values are added by
one hundred. The statistic results and the standard
of the stable RSSI are shown in Fig. 3. The
values are further utilized to define the thresholds.
The following formula provided by Texas Instruments
(TI), which represented the relationship between RSSI
and the estimated 1-D distance, is shown as follows:
RSSI = −(10n log10 d + A)
Fig. 4 Actual distance and derived distance (A, n) with
Isolated (6, 4); Absorb (45, 10); Normal (30, 9)
4.2 Experimental Results
In this experiment, an actual position is represented by
the coordinate (x, y), and an estimated position is
represented by the coordinate (i, j). Therefore, we can
simply define the accurate distance and represent by an
Error Distance formula as follows:
While n is a signal propagation constant or exponent,
d is a distance from the blind node to the reference node
and A is the received signal strength at 1 meter distance.
According to the formula (9), the 1-D distance d can be
derived from the measured RSSI values of Fig. 3 and
shown in Fig. 4.
Dist.( Lxy , Lij ) = ( x − i ) 2 + ( y − j ) 2
Table 2 Features of CC2431
Radio Frequency Band
Chip Rate(kchip/s)
Bit rate(kb/s)
Data Memory
Program Memory
Spread Spectrum
In order to validate accuracy of the proposed CTA, we
implemented and compared the proposed CTA with the
FPT [9] and RTT [12], which are experimented by using
the CC2431 location engine. The experimental results
are shown in Fig. 5 and Fig. 6. The x-axis represents the
distance from the blind node to the closest reference
node. The y-axis represents the difference between an
actual position and the estimated position.
128KB internal RAM
Fig. 5 Estimation errors at distance {0.5, 1.0, 1.5}
meters (Accuracy)
Fig. 2 CC2431 module
Fig. 6 Estimation errors at distance {2.0, 2.5, 3.0}
meters (Accuracy)
Fig. 7 Precision when error distance within 1.0 m
As we can see from the experimental results in Fig. 5,
when a blind node approaches to any reference node, our
algorithm can accurately determine the position with
error distance less than 1 meter. The accuracy of the CTA
is better than the other methods. At the same time, the
FPT method is accurate enough when the blind node is
moving close to the pre-trained positions. Furthermore,
the estimation errors calculated by CC2431 are quite
stable in Fig. 5, and the accuracy of RTT method is quite
independent of the positions of the reference nodes.
In Fig. 6, the distances from the blind node to the
closest reference node are increased. Therefore, the RSSI
values are more interfered by background noise, and the
variances are increased. In FPT method, the signal
features are diminished, so that the estimation errors are
obviously increased. In other words, the FPT method
cannot determine the position accurately when the
distance from the blind node to the closest reference
node is more than two meter. Under this condition, our
proposed CTA changed the operational mode from the
ACA to the RTT mode. As a result, the accuracy of the
proposed method is close to those of the RTT method. In
the case of x = 2.0m, the proposed CTA is slightly more
accurate than the RTT method. In the other case of x =
3.0m, the proposed CTA is slightly worse than the RTT
In Fig. 7 and Fig. 8, we show the precision of the
proposed CTA, the FPT, and the RTT. The precision is
defined as follows:
Number _ of _ within _ Acceptable _ Error _ Distance
Total Estimated Times
Fig. 8 Error distance {x} within 1.3m;
Error distance {y, z} within 1.7m
Fig. 9 Usage ratio of ACA & RTT modes
We showed the mode-changed functionality of the
proposed CTA at various distances. The usage ratios of
the ACA and the RTT are displayed in Fig. 9. As we can
see, the ACA method is useful if the distance is less than
1.5 meters. Furthermore, the ACA mode will be changed
to RTT if the distance increases over 1.5 meter. The
mode-changed operation can be practically made
according to the threshold we set. As a result, the
proposed CTA can select an adaptive mode to obtain
more precise location. The usage ratios of ACA and RTT
are showed in Fig. 9.
For the experimental design in Fig. 7, the acceptable
error distance is set as 1 meter. Under this condition, the
estimation errors, which values are less than or equal to 1
meter, are selected to calculate precision. As we can see,
the proposed CTA has at least 85% precision when the
distance is less than one meter. The CTA has higher
precisions than the other methods. In Fig. 8, the precision
is low yet in the case of x=2.5. This is because that most
estimated errors stay in the range of 1.5 and 1.8. That’s
an interesting situation.
5. Conclusion and Future Work
In this paper, we inquired about the RSSI solutions on
indoor localization, and proposed a new RSSI-based
algorithm using ZigBee CC2431 modules in wireless
sensor network. Moreover, we improved the FPT
algorithm at the same time. The mode-changed operation
of the proposed CTA is even designed for combining the
improved FPT and the RTT methods. The functionality
can adapt the operational modes according to the
thresholds, which we set and mentioned in the findings.
As a result, the proposed CTA can suitably select an
adaptive mode to obtain more precise locations. The
experimental results show that the proposed CTA can
accurately determine the position with error distance less
than 1 meter. At the same time, the proposed CTA has at
least 85% precision when the distance is less than one
For the various applications in home automation, the
proposed CTA can be applied to provide correct and
suitable services by estimating user’s locations precisely.
In the future, the proposed CTA can even bring
promising quality of services on caring elders in the
house. At the same time, we will try to improve the
real-time tracking algorithm of the CTA for increasing
the accuracy of the uncovered ranges, which positions
are beyond the reference nodes.
[7] P. Bahl and V. Padmanabhan, “RADAR: An In-Building
RF-based User Location and Tracking System,” In
Proceedings of the IEEE INFOCOM 2000, March 2000, pp.
[8] Angela Song-Ie Noh, Woong Jae Lee, and Jin Young Ye,
“Comparison of the mechanisms of the ZigBee’s indoor
localization algorithm,” in Proc. SNPD, pp.13-18, 2008.
[9] Qingming Yao, Fei-Yue Wang, Hui Gao, Kunfeng Wang,
and Hongxia Zhao, “Location Estimation in ZigBee
Network Based on Fingerprinting,” in Proc. IEEE
International Conference on Vehicular Electronics and
Safety, Dec 2007.
[10] Shashank Tadakamadla, "Indoor local positioning system
for zigbee based on RSSI", M.Sc. Thesis report, Mid
Sweden University, 2006.
[11] C. Gentile and L. Klein-Berndt, “Robust Location Using
System Dynamics and Motion Constraints”, in Proc. of the
2004 IEEE International Conference on Communications,
vol. 3, pp. 1360-1364, June 2004.
[12] System–on–chip for 2.4 GHz ZigBee /IEEE 802.15.4 with
Instruments,, July 2007.
[13] K. Aamodt., CC2431 Location Engine. Application Note
AN042, Texas Instruments.
This research was partially supported by the second
Applied Information Services Development and
Integration project of the Institute for Information
Industry (III) and sponsored by MOEA, Taiwan R.O.C.
[14] ZigBee Alliance, ZigBee Specification Version r13, San
Ramon, CA, USA, Dec. 2006.
[1] Erin-Ee-Lin Lau, Boon-Giin Lee, Seung-Chul Lee, and
Wan-Young Chung, “Enhanced RSSI-Based High Accuracy
Real-Time User Location Tracking System for Indoor and
Outdoor Environments,” International Journal on Smart
Sensing and Intelligent Systems, Vol. 1, No. 2, June 2008.
[2] Youngjune Gwon, Ravi Jain, and Toshiro Kawahara,
“Robust Indoor Location Estimation of Stationary and
Mobile Users,” in Proc. of IEEE INFOCOM, March 2004.
[3] Masashi Sugano, Tomonori Kawazoe, Yoshikazu Ohta, and
Masayuki Murata, “Indoor Localization System Using
RSSI Measurement of Wireless Sensor Network Based on
ZigBee Standard,” in Proc. of Wireless Sensor Networks
2006 (WSN 2006), July 2006.
[4] Stefano Tennina, Marco Di Renzo, Fabio Graziosi and
Fortunato Santucci, “Locating ZigBee Nodes Using the
TI’s CC2431 Location Engine: A Testbed Platform and
New Solutions for Positioning Estimation of WSNs in
Dynamic Indoor Environments,” in Proc. of the First ACM
International Workshop on Mobile Entity Localization and
Tracking in GPS-less Environments (MELT 2008), Sep.
[5] IEEE 802.15 WPAN™ Task Group 4, IEEE
[6] Allen Ka, and Lun Miu, “Design and Implementation of an
Indoor Mobile Navigation System,” Master thesis of CS at
MIT 2002.
A Personalized Service Recommendation System
In a Home-care Environment
Chi-Lu Yang1,2, Yeim-Kuan Chang1, Ching-Pao Chang3, Chih-Ping Chu1
Department of Computer Science and Information Engineering, National Cheng Kung University
Innovative DigiTech-Enabled Applications & Service Institute, Institute for Information Industry
Department of Information Engineering, Kun Shan University
Tainan, Taiwan R.O.C.
Kaohsiung, Taiwan R.O.C.
[email protected], {ykchang, chucp}, [email protected]
In this paper, we developed a personalized service
recommendation system based on patient’s preferences
in a home-care environment. For the recommendation
services, we first explored the process of generating the
recommendable service. We then constructed personal
models by analyzing the patient’s activity patterns.
Through the personal models, the system will be able to
automatically launch to safety alert, recommendable
services and healthcare services in the house. The
proposed system and models could even carry out the
mobile health monitor and promotion. The rest of this
paper is organized as follows. In Section 2, we are going
to introduce the recommendation and personalization
services. In Section 3, the proposed system and service
groups are described. The processes of generating
recommendable services are also mentioned. In Section 4,
the personal models are carefully explained. The
experiments and test cases are discussed in Section 5.
The conclusion of the study is summarized in Section 6.
Many bio-signals of the chronic patients could be
measured by various bio-devices and transferred to
back-end system over the wireless network through the
homebox. In a home-care environment, it becomes more
complex to reliably process transmitting and receiving
these bio-signals by the homebox. While the bio-devices
increasing, the process is even much more complex. In
addition, the chronic patients always hope to be served
correctly and suitably by the service system in the house.
Therefore, we have to provide services such as adjusting
room temperature and lighting etc to make those patients’
daily lives easy. In this paper, we propose a personalized
service recommendation system (PSRS) based on users’
preferences and habits. The PSRS has capability of
providing suitable services. Furthermore, we construct
personal models to record the patients’ daily activities
and habits. Through the models, the system will be able
to automatically launch to safety alert, recommendable
services and healthcare services in the house. In the
future, the proposed system and models could even carry
out the mobile health monitor and promotion in a
home-care environment.
2. Related Work
2.1 Recommendation Service
The recommendable services are popularly applied on
the Internet, such as the on-line recommended services
(, customized services (mywashington, personalized advertisements (, and other similar services [2]. By retrieving
and analyzing the interactions between the users and the
systems, recommendations services could be precisely
delivered. The recommendable services are sometimes
generalized to match the personal preferences [3]. In
order to fit in with the user’s demands, services are
personalized and recommended based on the user’s
preferences and the contexts. Studies in applied systems
showed that recommendations based on the user’s habits
can friendly get user’s responses [4]-[6]. These results fit
in with the studies in human-computer interaction and
e-learning domains.
The Recommendation system could even provide
custom-oriented services which differ from traditional
service system. The services of the system could be
personalized according to personal profiles. In order to
Keywords: personalized service, home-care system,
service-oriented architecture
1. Introduction
Many bio-signals of the chronic patients could be
measured and transferred over the wireless network
through the homebox [1], [18]. However, bio-devices are
gradually increasing recently. To manage the devices is
thus becoming much more complex. Due to this situation,
performance level is seen to decrease when a large
number of data change occurs. In addition, the chronic
patients always hope to be served correctly and suitably
by the service system in their houses. Therefore, we have
to provide services such as adjusting room temperature
and lighting etc to make those patients’ daily lives easy.
If the system could actively predict the patient’s
preferences or habits, the system will be able to serve the
person in advance with high quality of service.
achieve this goal, the primary step is to collect various
information sources. These sources could be
approximately classified into two types. The first one is
user-relevant information, such as name birthday, health
status, habits, and behavior patterns. The second type
comes from the environment, such as the statuses of the
devices, interactions between users and devices, the
weather, the time, temperature, brightness and the others.
The two kinds of these sources are primary foundations
to build personalization services. Unfortunately, some
sources are dynamically changed by the external factors.
Furthermore, the user’s demands are even too diverse to
be monitored effectively. As a result, it is a challenge to
recommend suitable services to a user.
stereotypes. The hierarchical style could precisely
describe the user’s behaviors when it goes down to low
hierarchical nodes.
2.3 Interface Management and Query
Interface management is a mechanism for managing
and providing various services to others. Web service is
one of the most popular techniques nowadays. Through
web services, various services are distributed in different
systems and managed by individuals. The World Wide
Web Consortium proposed three major roles for web
services. (1) Service provider is defined to provide
remote services. (2) Service registry is defined to provide
registration and publication. (3) Service consumer is a
role which requests to serve and receive the services.
First, the service provider generates service descriptions
and registers them into the service registry. The service
consumer then requests the service registry and receives
the interface descriptions. The relevant techniques are
Web Services Description Language (WSDL) [13],
Simple Object Access Protocol (SOAP; [14]), Universal
Description, Discovery, and Integration (UDDI) [15],
and Extensible Markup Language (XML) [12].
In business applications, web services are proven to be
composed in a complex manner [16]-[17]. Moreover,
IBM WebSphere could support standardized web
services and cooperate with the Microsoft workflow tool.
The BEA WebLogic server not only supports web
services and XML, but also composes new services. Web
services are extendable techniques, especially on
developing a large system.
2.2 Personalization Service
If we wish to properly recommend the services to a
user, we should not only pay attention on the data
sources. We also have to concentrate on personalization.
For the service personalization, the key factors are to
sensing the user’s preferences and habits. Through these
personal patterns, the existed services could be possibly
adapted to match the user’s needs. A recommendable
service system could be inquired by the following
viewpoints: user modeling, context modeling, semantic
interoperability and service composition, self-service
management, and so on. Using user models to predict
user’s needs is one of the most popular methods. An
excellent user model would be able to select the proper
attributes for exploring the user’s behavior patterns [7].
The recommendable services could be dynamically
composed and properly provided based on the users’
patterns in specific environments.
The overlay model is a modeling technique based on
collecting user’s behaviors [8]. The primary idea of the
overlay model is that a user’s behavior is a subset of all
users’ behaviors. Therefore, a common model could be
built by generalizing all users’ behaviors. Individual
model could then be established by comparing it with the
common model.
The stereotype users’ model is a speedy modeling
technique. The model could be fundamentally built up,
even when it lacks the user’s behaviors component [9] [10]. Although the model is built by the approximate
value, it could perform effectively in many applications
[11]. In order to build the stereotype model, the
following elements are needed: user subgroup
identification (USI), identification of users’ key features
(IUF), and representation template (RT). The first
element is used to identify the subgroups’ features. Users
in a subgroup have application-relevant features on their
behavior models. The second element is used to define
the users’ key features, which differ from the other
subgroups. Furthermore, the presence and absence
features should be clearly identified for decision support.
The third element is hierarchically represented. The
representations should be distributed in different systems.
The representation templates in subgroups are named as
3. System Architecture
Service Oriented Architecture (SOA) is an emerging
architectural style. The major ideas of SOA are that
service elements are granularly defined and constructed,
service interfaces are clearly standardized for composing
new services, and the services built by following SOA
are reusable. By composing services iteratively, a new
system could be formatted for serving a specific domain.
A SOA-based system usually includes three key features:
software components, services elements and business
processes. Web service is one of the most important ways
to implement SOA. Web service requires XML-based
techniques, such as XML, WSDL, SOAP, and UDDI.
The proposed personalized service recommendation
system (PSRS) was built by following the SOA
principles. The service elements in PSRS are distributed
in different sub-systems. In PSRS, web services are
evolved by three generations. First, a number of simple
web services are implemented and usually used for query
and response. Second, composite web services are
derived from the simple ones to form more complex
applications. Third, collaborative web services are
continually emerging. These dynamic services could
automatically support business agility. The architecture
of PSRS would be flexible and extendable.
models. For example, when a person moves close to
certain facilities, this represents the possibility of use of
the facilities. A person who moves from one position to
another also represents specific activities, such as
entering or exiting a room. Even a person who keeps
motionless for a period of time would possibly represent
some meanings. Furthermore, moving speed, pattern, and
displacement are also key factors for modeling the
person’s behaviors.
(3) Environment Services (ENS): The environmental
services could publish the contexts statuses and provide
query services for the other services. Through the ENS,
the others services could get the contextual statuses for
further recommendable control. For example, the
contexts are date, time, temperature, brightness, weather,
noise and so on. The environmental devices would also
be able to be controlled by ENS since the devices
conditions could be simply queried. Furthermore, the
services could thus be recommended to automatically
control the devices for fitting users’ preferences.
(4) Management Services (MGS): The services are
responsible for managing the other services and some
functionality. The other services would be registered and
published in UDDI server and managed by MGS. The
user’s authority in PSRS would be managed by MGS as
(5) Personalized Recommendation Services (PRS):
The models of personal activities are analyzed and built
by PRS. Personalized services are recommended
according to the personalized models, which are tuned by
the pre-defined general model and personal behaviors.
Personal services in the digital home could be
automatically triggered before the user manually controls
them. For instance, we could preset control the status of
air conditioner, lighting, television, exercise devices
among other devices.
For the user’s scenarios of this paper, the user takes a
mobile measurable device with wireless ID card. The
fixed homebox in the house could receive the user’s
bio-signals and locations from the wireless sensor. At the
same time, the MGS could acquire these data and those
contexts from the environment. Then, the PSRS could
actively select the adaptive services by the previous
contexts and the personal models. The PSRS architecture
is shown in Fig. 2.
Fig. 1 The distributed Services in PSRS
Fig. 2 System Architecture
The services in PSRS are developed according to SOA
principles, too. There are three key benefits. By
upgrading service components, the system performance
could be improved at pace and the faults could be
gradually reduced. Second, the system services could be
enriched by increasing the number of service
components. The system would become progressively
better and friendly. Third, users’ demands could be fitted
in with the dynamic services composition. We could
model user’s preferences and compose new services for
further recommendation. In order to keep the flexibility
and extensibility, the services in PSRS are distributed in
different service groups. They are explained in the
following sub-section and shown in Fig. 1.
3.1 Grouping Services
3.2 Generating Recommendable services
(1) Personal Profiles Services (PPS): Personal
profiles are key factors when recommending services to
the person. Data stored in personal profiles could be
classified into static data and dynamic data. A person’s
name, ID, sex, and blood type are categorized as static
data. Dynamic data is composed of personal information
which is possibly variable, such as age, habit, health
status, behavior feature, service level and authority. The
dynamic data should be automatically collected and
analyzed by the information systems.
(2) Location Services (LS): The person’s locations are
usually key factors in judging the person’s behavior
The recommendable services are reasoned out by the
contexts, personal models and the user’s locations in the
PSRS. The static factors and rules are pre-defined by the
web-based editor and stored in the knowledge base. The
dynamic factors used for triggering rules are dispatched
from the PPS, LS and ENS. The services would thus be
recommended by following the actions of the triggered
rules. In addition, the dynamic rules are formulated by
the tuned personal models and the factors. If no personal
models exist, the general model would be selected. The
personal models are carefully described in Section 4. The
ruling outputs might be automatically used for passing
messages or controlling devices in the digital home. Its
outputs could even call on a series of others services. The
decision process is shown in Fig. 3.
4. Personalization Models
In PSRS, the primary contexts used to personalize
services are the personal models, locations and his/her
health statuses. The user could select service modes,
such as manually setting devices or automatically
recommendation services. If the user enters in the service
scope, his/her ID will be sensed. The user’s locations
will also be identified to trigger recommendable services.
The interoperability of service providing is shown in
Fig. 4. If there are existing personal models, the models
will be loaded to bind the activity patterns and to select
the proper items. The parameters of the devices could be
set by the quantitative items. For example, the values of
air conditioner and lamplights would be automatically set
by following the user’s preference. The device usage
progress of the user will be recorded to update the
personal models, which are built and analyzed through
personal modeling.
Fig. 5 Personal models Generation
If there are no existing personal models, the general
models will be recommended to bind the user’s patterns.
General parameters will also be set in those devices. If
the user manually set the devises during the progress of
usage, the user’s intension will be recorded for tuning
and building new personal models. Personal models of
the user would be available for use next time.
Undoubtedly, the user could manually set the devices
anytime. He/she could manually set the music volume or
TV channel for instance.
4.1 Personal Modeling
Fig. 3 Recommendable services Generation
The user’s raw data is collected from recording the
interactions between the individual and the devices.
Through analyzing the raw data, personal models could
be generated. A user’s activity patterns will cyclically
occur in the same conditions. The cycle effective data are
collected by mining the raw data. By discretion the
effective data and incorporating them into distinct
degrees, the cycle patterns could be found. A cycle
pattern is combined with the ruling items, which are
mapped to the functions on the devices. Therefore, the
cycle patterns could be bound to serve the user. The flow
of personal models generation is shown in Fig. 5. The
personal models are stored in data repositories. As
mentioned in the previous section, personal models could
also be updated by the new raw data. Likewise, the
general models can be generated by the same process in
Fig. 5. However, the differences are that all uses’
activities are selected for analyzing.
4.2 Ruling Items and Cycle Patterns
Personal models are stored in a rule-based database.
The rules combined with factors and formulas. There are
three kinds of factors: 1. Event – dynamic factors; 2.
Status – static factors; 3. Compound – composite factors.
Each factor has its own identified number. The static
factors are represented by negative numbers, while the
dynamic factors are represented by positive numbers.
The formulas are combined by factors, and are stored
with IF-ELSE format in a database. In order to edit the
formulas, these are represented as mathematical
equations on the website. For instance, the formula could
be represented as “2 + 3 + 7 + -8 = 10”. The numbers of
Fig. 4 Interoperability of service providing
the formula are factor identifications. The symbol “+”
refers to the sequences of the occurred factors. Each
launched formula corresponds to an active service.
Formulas could be iteratively launched by the dynamic
factors during the recommendation. The composite
services will be appeared by the iterations. !
in the following:
5. Experiments and Verification
For services recommendation, the services could query
contexts statuses through web services. In order to show
the flexible services composition, the system operator
will modify the defined cycle patterns and ruling items.
The flexible cycle patterns will then provide different
services. The scenarios are shown in Fig. 8. The blue
arrows are the first activities. The yellow arrows are the
second activities. The green arrows are the third
activities. The cycle patterns are shown in the following:
(1) Enter POS4 + none user: ring doorbell
(2) Enter POS4 + not login: automatically login
(3) Login + Leave POS4 = login out
5.4 Test Cases 3
5.1 Experiment Environment
The PSRS was implemented by the C#.NET
programming language. We integrated many types of
devices to verify our PSRS. A computer server was
remotely installed for serving web service techniques. A
laptop was included to connect the ZigBee coordinator
for receiving the user’s locations. The ZigBee modules
contain six CC2430 reference nodes, two CC2431 blind
nodes and one CC2430 coordinator. The ZigBee modules
were purchased from Texas Instruments. One
programmable logic controller (PLC) was linked to the
laptop by the RS232 interface. The PLC was used for
controlling home devices, such as three color lights (red,
green, white), one electric fan and one doorbell. The
homebox and bio-server are provided by the Institute for
Information Industry (III). These facilities are used for
measuring the use’s bio-signals.
5.2 Test Cases 1
Ё Flexible services composition
(1) Enter POS5 + Leave POS5: Leave POS5 (Ring
(2) Enter POS5 + Evening = Turn on light
(3) Turn on light manually + Leave POS5: Turn off the
light automatically
After modifying the cycle pattern:
(4) Enter POS5+Not Evening: Power on a electric fan
(5) Power on the electric fan + Leave POS5: Power
off the electric fan
Ё The user activity patterns
The cycle patterns are pre-defined in the web-based
editor. These patterns are shown in the following:
(1) Pass through POS1: Turn off green light
(2) Pass through POS2: Turn on green light
(3) Pass through POS3: ring doorbell
(4) Pass through POS5: ring doorbell
(5) Clockwise (POS3+POS2+POS1+POS5+POS3):
flash red light
(6) Counter-clockwise (POS1+POS2+POS3+POS5+
POS1): flash white light
A user would launch the services if his/her activities
matched the pre-defined patterns. The scenario is shown
in Fig. 6. The blue arrows display the counter-clockwise
pattern. The yellow arrows display the clockwise pattern.
The experimental results showed that the PSRS could
correctly execute distinct services based on the user’s
activity patterns.
5.3 Test Cases 2
Fig. 6 User Activity Patterns (ZigBee Localization
Ё Multiple users login service
In this scenario, two users will enter the sensing scope
of the homebox. The first user’s profile will be
pre-loaded into the homebox if he/she enters the scope.
He/she could automatically login and then measure
his/her bio-signals. If he/she finishes and leaves the
sensing scope, he/she would also automatically logout.
The second user could then automatically login and use
the homebox. The users don’t need to manually operate
the login process. The login service could automatically
work in a multi-user environment. The sequences of the
occurred activities are shown in Fig. 7. The yellow
arrows are the first user’s activities. The blue arrows are
the second user’s activities. The cycle patterns are shown
Fig. 7
Multiple Users
Fig. 8
Flexible Services
“User modeling: Recent Work, Prospects and Hazards,“ in
Adaptive User Interfaces: Principles and Practice, , 1993.
6. Conclusion and Future Work
In this paper, we developed a personalized service
recommendation system (PSRS) in a home-care
environment. The PSRS has capability of providing
proper services based on the user’s preferences. For the
recommendable services, we explored the processes and
data sources of generating the recommendable service.
Furthermore, we construct personal models to record the
user’s activities and habits. Through the personal models,
the system will be able to automatically launch to safety
alert, recommendable services and healthcare services in
the house. In the future, the proposed system and models
could even carry out the mobile health monitor and
promotion in a home-care environment.
[12] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve
Maler, François Yergeau eds. Extensible Markup Language
(XML) 1.0 (Fourth Edition), World Wide Web Consortium
(W3C) Recommendation, Nov., 2008.
[13] David B., Canyang Kevin L., Roberto C., Jean-Jacques M.,
Arthur R., Sanjiva W. et al. Web Services Description
Language (WSDL) Version 2.0 Part 1: Core Language.
W3C:, 26 June 2007.
[14] Martin G., Marc H., Noah M., Jean-Jacques M., Henrik F.,
Anish K., Yves L. SOAP Version 1.2 Part 1: Messaging
Framework (Second Edition), W3C:
TR/2007/REC-soap12-part1-20070427/, 27 April 2007.
[15] Tom B., Luc C. and Steve C. et al., UDDI Version 3.0.2.
OASIS:, 19
October 2004.
7. Acknowledgements
[16] Brahim Medjahed, Atman Bouguettaya, Ahmed K.
Elmagarmid, “Composing Web services on the Semantic
Web,” The VLDB Journal, 12(4), pp.333-351, Sep. 2003.
This research was partially supported by the second
Applied Information Services Development and
Integration project of the Institute for Information
Industry (III) and sponsored by MOEA, Taiwan R.O.C.
[17] W.M.P. van der Aalst, “Don’t go with the flow: Web
services composition standards exposed,“ IEEE Intelligent
Systems, 2003.
8. References
[18] Chi-Lu Yang, Yeim-Kuan Chang and Chih-Ping Chu, “A
Gateway Design for Message Passing on SOA Healthcare
Platform,“ in Proceedings of the Fourth IEEE International
Symposium on Service-Oriented System Engineering
(SOSE 2008) , pp. 178-183, Jhongli, Taiwan, Dec. 2008.
[1] Chi-Lu Yang, Yeim-Kuan Chang and Chih-Ping Chu,
“Modeling Services to Construct Service-Oriented
Healthcare Architecture for Digital Home-Care Business,”
in Proceedings of the 20th International Conference on
Software Engineering and Knowledge Engineering
(SEKE’08), pp. 351-356, July, 2008.
[2] B.P.S. Murthi, and Sumit Sarkar, “Role of Management
Sciences in Research on Personalization,” Management
Science, Vol. 49, No. 10, pp. 1344-1362, Oct. 2003.
[3] Asim Ansari and Carl F. Mela , “E-Customization,“ Journal
of Marketing Research, Vol. 40, No. 2, pp. 131-145, May
[4] Kar Yan Tam, Shuk Ying Ho, “Web Personalization: Is It
Effective?,” IT Professional, Vol. 5, No. 5, pp. 53-57, Oct.
[5] Hung-Jen Lai, Ting-Peng Liang and Y.-C. Ku, “Customized
Internet News Services Based on Customer Profiles,“ in
Proceedings of the 5th international conference on
Electronic commerce, pp. 225-229, 2003.
[6] James Pitkow, Hinrich Schütze, Todd Cass, Rob Cooley,
Don Turnbull, Andy Edmonds, Eytan Adar and Thomas
Breuel, “Personalized Search,“ Communications of the
ACM, Vol. 45 Issue 9, pp. 50-55, 2002.
[7] Josef Fink, Alfred Kobsa and Andreas Nill, “User-Oriented
Adaptivity and Adaptability in the AVANTI project,“ in
Conference Designing for the Web: Empirical Studies,
Microsoft, Redmond, WA, Oct. 1996.
[8] Peter Brusilovsky, “Methods and techniques of adaptive
hypermedia,“ Journal of User Modeling and User Adapted
Interaction, Vol. 6, No. 2-3, pp. 87-129, 1996.
[9] Wolfgang Wahlster and Alfred Kobsa: Stereotypes and User
Modeling. Springer. User Models in Dialog Systems, pp.
35-51, Springer, Berlin, Heidelberg, 1989.
[10] Chin, D.N. KNOME: Modeling what the User Knows in
UC. User Models in Dialog Systems, pp. 74-107. Springer,
Berlin, Heidelberg, 1989.
[11] M. Schneider, Hufschmidt, T. Kühme, and U. Malinowski,
Design and Implementation of OSGi-Based Healthcare Box for Home Users
Bo-Ruei Cao1, Chun-Kai Chuang2, Je-Yi Kuo3, Yaw-Huang Kuo1, Jang-Pong Hsu2
Dept. of Computer Science and Information Engineering1
National Cheng Kung University, Tainan, Taiwan
Advance Multimedia Internet Technology Inc., Taiwan.2
Institute of Information Industrials, Taiwan3
[email protected]
functions makes these aged living alone occurred
inconvenient situation, especially in the aged
suffering chronic diseases. Therefore the health-care
issues of the aged people have become significant
and attractive for research. For aged people, the
offering of Long-Distance Home-Care for the aged
people with chronic disease such as hypertension will
effectively improve the living quality and reduce the
burden of hospital-care system. Long-Distance
Home-Care provides such functions as personal
emergency rescue, long term physiological signal
monitoring (an aspect that is keenly important to
physiological monitoring equipment industry) that
uses electronic blood pressure devices, blood glucose
Distance-Care approach will improve incomes and
services for hospital systems, telecommunication
companies, and security companies. It is projected
that production value of Long-Distance Home-Care
in Taiwan is expected to reach the scale of NT$7
billions by 2010. [1]
Not only in Taiwan, has research showed that
the global home-care market is growing rapidly by
20% each year. In 2006, the scale of the global
home-care market was about USD 71.9 billions. It is
predicted that it will increase to USD 79.6 billions in
2010. If all related industry and institutional services
were included then that market scale would be even
larger. What these statistics underscore is the need for
home and institutional care for senior citizens to let
them play a greater part in modern day society.
With the development of Internet technology
and computing costs continue to decline, the
realization of being the dream of digital home life. In
this paper, we will explore Open Service Gateway
initiative (OSGi) technology to develop a
transferable framework to support a cross-platform
environment of health-care service Intelligent Home
Health-Care Box platform to achieve the following
objectives: (1) To develop remote physiological
signal measurements (2) To take advantage of OSGi
(Open Service Gateway initiative) to construct a
transferable framework for embedded computing. (3)
To reduce the program size in an embedded system
and upgrade the performance in run time stage. In
this constructed environment, home user can demand
services prompted by a service-discovery mechanism
and interact with health-care devices through
network. In other words, networking, intelligence and
multimedia are the guidelines to investigate and
develop the residential information system who will
serve people at home in a friendly manner and
improve the quality of home living.
Keywords: Embedded System, medical equipment,
Home Health Care, OSGi, Remote Health Care.
As medical science and technology developing,
the average age of humans is also growing and the
social structure is aging. According to the definition
by World Health Organization (WHO), an ageing
country is the one whose 7% population has the age
greater than 65 years old. The WHO estimates that,
in 2020, most developed countries will encounter the
problem of ageing population. In particular, Japan,
North Europe and West Europe will have ageing
population more than 20%.
In Taiwan, Taiwan indeed becomes an aging
country in September 1993. At present, Taiwan's
population aged over 65 have more than two million.
According to the latest survey, the aged population
has exceeded 9.1% in Taiwan. Council for Economic
Planning and Development (CEPD) in Taiwan
estimates that, in 2031, the 65-year-old population
will reach 19.7 percent of the total population. In the
other words, every five people have one aged people
in that time. The degradation of physiological
The Intelligent Home Health-Care Box
platform [2] already has been achieved using
network technology and information technology to
provide an intelligent assisted care system. Allow
remote monitoring by the medical staff to obtain
measurements of physiological signals of patients,
will be able to greatly improve the current blind spot
in the regular care visits. Acquired physiological
signals are more immediate and can also reduce the
time to measurement to the home. In addition,
measurement of physiological signals has become
medical records of patients. Therefore the Intelligent
Home Health-Care Box really can assist in
monitoring by the health status of caregivers allows
patients to get the best home care environment. But
with the development of Internet technology and
computing costs continue to decline, the requirement
for digital medical care would be growing and
diversified. This delivery system will inevitably have
to rely on the transfer type can be embedded platform
technology to provide remote computing service
composition, remote service delivery functions.
In this paper, we will research for transferable
framework for embedded computing applies to
digital care services. We will base on embedded
platform to design a transferable computing
technology, this computing technology necessary
includes: (1) Construction services provide real-time
execution environment. (2) Automated service
management and scheduling. (3) The services
required for real-time transmission of content. (4)
The development of service programs to be able to
transfer between different platforms.
In other words, we must construct a
model-oriented service structure, to achieve service
description, construction services, authentication
service and service delivery functions of the target.
And use home-care as an example, to carry out
situational analysis and build the prototype system
developed to validate the technology. We call the
system named model-oriented nursing system
The rest of this paper is organized as follows.
The related works of information technology applied
on remote nursing application are reviewed in
Section 2. In Section3, the proposed remote MON
system is presented in terms of architecture,
functions and implementation. The Numeric Results
of MON system are demonstrated in Section 4.
Finally, this work is concluded in Section 5.
treatment to save that person.
K. Doughty et. al. [5] presented a dementia
patients living alone monitoring system to monitor
the daily behaviors of dementia patients and to
generalize an on-going dementia lifestyle index (DLI)
for each patient. The DLI is empirically useful to
verify the effectiveness of the medical treatment and
to guide the treatment of each patient. American
TeleCare, Inc. [6] established in 1993 has 9500
market sharing of home telemedicine products
including a patient station connecting Central Station
by a phone line to transport the signals of telephonic
stethoscope, blood pressure meter and oximeter.
Patient station monitors the patient status and
delivers the data to Central Station. However, patient
station does not analyze the collect signals and
response to the exception. Nigel H. Lovellv et. al. [7]
demonstrated a web-based approach to acquisition,
storage, and retrieval of biomedical signals. The
home patient monitored by a terminal to record his
blood pressure, breath, pulse. The records are
delivered to and stored at hospital database. Clinic
doctors will heal the patient with more useful
medical information.
Most patient-monitoring applications [8-12] do
not allow remote access control from care center.
This kind of system can not be managed remotely.
The proposed MON provides remote access control
function to manage. By this way, the wide-deployed
MON are feasible to maintain and the maintain cost
can be significantly reduced. The remote access
control is designed based on Open Service Gateway
initiative (OSGi).
Besides, since MON is operated across Internet,
the impact of network performance and the
requirement of network resource should be studied.
Horng et. al [13-14] proposed a delay-control
approach to guarantee the quality-of-service (QoS)
for home users and a fine granularity service level
agreement (SLA) to manage network resource.
Huang et al. [15] presented a residential gateway to
translate communication protocols, coordinates
information sharing and serves as a gateway to
external networks for integrated services. The
evolving techniques are greatly beneficial for users in
home environments. Thus, in this paper, the
characteristic of network resource usage caused by
the proposed MON is also investigated deeply.
Related Work
The nursing problem of aged people is a
critical issue in most developed countries, such as
United State, Japan and Europe. In US, the care
demand of aged people facilitates the market growth
of home-nursing services. And home-nursing
services gradually become a demand in trend and
attract lots of researches. The previous research
works mainly focus on how to employee the modem
information and networking technologies to establish
computer-aided home nursing systems. For example,
Wong et. al. [3] proposed a lifestyle monitoring
system (LMS) using passive infrared movement
detector (PIR) to detect the behavior and body
temperature of the cared patient in room. When
unusual conditions are sensed by a control box, the
control box will deliver the collected data to
laboratory for further analysis. N. Noury et. al. [4]
proposed a fall sensor composed of infrared position
sensors and magnetic switches to remote monitoring
human behavior. Once the monitored person is
falling down, the fall sensor notifies the remote care
center through RF wireless networks. The care center
will assign neighboring rescuer to deliver the in-time
OSGi-based Healthcare Homebox of
Model-Oriented Nursing System
The system architecture of the MON is
depicted in Fig.2. There are three parts to introduce:
Hardware platform (Intel Xscale 270-S), System
Software and Functional module. Hardware platform
is developed on Intel Xscale 270-S. This platform
includes processor, flash, sdram and many interfaces.
The processor PXA270 [16] is designed to meet the
growing demands of a new generation of
advanced technologies that offer high performance,
flexibility and robust functionality, the Intel PXA270
processor is packaged specifically for the embedded
market and is ideal for the low-power framework of
battery-powered devices. The MON platform can use
external 5V power supply to work or use built-in
3500mA/h lithium battery to work. The battery can
supply power more than 5 hours and the platform
support power supply charge or USB charge.
Therefore it is very suitable for mobile devices. The
main specifications are as follows Intel Xscale 270-S
hardware specifications, finishing in the table 1.
System software has four parts: (1) Hardware
driver (2) Operation system (3) Embedded JVM (4)
OSGi framework
(1) Device driver:
The Device driver contains RS232, Ethernet,
Frame buffer, Touch panel and Sound. RS232 driver
for connect RS232 medical instruments and collect
the measurement from instruments. Ethernet driver is
for remote monitoring. Frame buffer driver is for
display, Touch panel driver is for user control. Sound
driver is for alert.
(2) Operation system:
We use Linux, Kernel version 2.6.9 for our
Operation system. The library we use uClibc because
it is most suitable embedded Linux.
(3) Embedded JVM:
At present, an emerging issue in JVM is that
applies JVM to use on embedded system. Java
standards are dominated by the Sun Company. Sun's
JVM and other Java API have been regarded as the
standard Java platform. Any implementation of Java
platform must be compatibility with Sun's JVM
platform as a top priority. But a long time, Sun has
been reluctant to put own Java platform open. When
use java may be a concern about authorized,
indirectly hinder the promotion of Java. But that has
inspired many of the JVM have Open source, such as
Kaffe, Jikes RVM, JamVM and so on. In the last year,
Sun donated the Java technology to Open Source
Community (OpenJDK [17]). But OpenJDK still
have the problem that it rarely platforms supported.
Because we need a JVM can support many platforms,
we still need other Open Source issue.
Existing implementation of Open Source Java
usually based on GNU Classpath [18]. GNU
Classpath 1.0 will be fully compatible with the 1.1
and 1.2 API specifications, in addition to having
significant (>95%) compatibility with the 1.3, 1.4,
1.5 and 1.6 APIs. As a result of Classpath have
significant compatible with JAVA API, many
Implementation of Open Source JVM use Classpath
to its API. And in Open Source JVM, we choose
JamVM [19] as our platform’s JVM
JamVM is a new Java Virtual Machine which
conforms to the JVM specification version 2 (blue
book). In comparison with most other VM's (free and
commercial) it is extremely small. JamVM’s
interpreter is highly optimized, incorporating many
state-of-the-art techniques such as stack-caching and
direct-threading. The stack-caching is keeping a
constant number of items in registers is simple, but
causes unnecessary operand loads and stores. E.g., an
instruction taking one item from the stack and
producing no item (e.g., a conditional branch) has to
load an item from the stack, that will not be used if
the next instruction pushes a value on the stack (e.g.,
a literal). It would be better to keep a varying number
of items in registers, on an on-demand basis, like a
cache. The direct-threading is used in order to save
memory space, the compiler will generate the
corresponding subroutine native code, but in an
indirect line of the structure of serial code, the
compiler does not directly generate native code, and
are independent of each subroutine in the library, and
at compile time, resulting in subroutine inside the
library of memory addresses, and then to
implementation, the overhead line through a series of
code memory addresses, complete executive action.
As most of the code is written in C JamVM is easy to
port to new architectures. So far, JamVM supports
and has been tested on the following
OS/Architectures, including PowerPC, PowerPC64,
i386, ARM, AMD64, i386 with Solaris/OpenSolaris.
In addition, JamVM is designed to use the
GNU Classpath Java class library. A number of
classes are reference classes which must be modified
for a particular VM. These are provided along with
JamVM. JamVM should always work with the latest
development snapshot of Classpath.
(4) OSGi framework:
OSGi Framework implements a complete and
dynamic component model, something that does not
exist in standalone Java/VM environments.
Applications or components (coming in the form of
bundles for deployment) can be remotely installed,
started, stopped, updated and uninstalled without
requiring a reboot; management of Java.
The OSGi Alliance [20] (formerly known as
the Open Services Gateway initiative, now an
obsolete name) is an open standards organization
founded in March 1999. The Alliance and its
members have specified a Java-based service
platform that can be remotely managed. The core part
of the specifications is a framework that defines an
application life cycle management model, a service
registry, an Execution environment and Modules.
Based on this framework, a large number of OSGi
Layers, APIs, and Services have been defined.
We choose OSCAR (Open Service Container
Architecture) [21] as our platform's OSGi framework
because it is a tiny OSGi framework. The program
size of OSCAR OSGi framework only 388 KB in run
time, other OSGi framework such as Knopflerfish
and Equinox all need more than 5 MB program size
in run time. At present, OSCAR has been renamed as
felix [22], being developed by the Apache.
Scenario 3: Physiological signal measurement
The MON platform has a touch panel can let user
touch screen to control the connected RS232
physiological signal measuring apparatus such as
ventilator, blood pressure monitor and pulsimeter
through GUI bundle. And use RS232 Interface
Bundle to change the physiological signal from
RS232 physiological signal measuring apparatus and
record that in to MON platform. Even let this
information upload to remote care center for monitor
and record.
In the OSGi framework, the software can
Independent implement the function completely
known as the Bundle. In terms of Implementation,
Bundles are normal jar components with extra
manifest headers. A Bundle object is the access point
to define the lifecycle of an installed bundle. The
lifecycle of bundle is show in Fig.3.
Each bundle installed in the OSGi environment
must have an associated Bundle object. A bundle
must have a unique identity, a long, chosen by the
Framework. This identity must not change during the
lifecycle of a bundle, even when the bundle is
updated. Uninstalling and then reinstalling the bundle
must create a new unique identity. A bundle can be in
one of six states: UNINSTALLED, INSTALLED,
Values assigned to these states have no specified
ordering; they represent bit values that may be ORed
together to determine if a bundle is in one of the
valid states. A bundle should only execute code when
its state is one of STARTING, ACTIVE, or
STOPPING. An UNINSTALLED bundle can not be
set to another state; it is a zombie and can only be
reached because references are kept somewhere. The
Framework is the only entity that is allowed to create
Bundle objects, and these objects are only valid
within the Framework that created them.
The main purpose of OSGi standard is to
provide a complete point-to-point service delivery
solution between remote care center and local MON
platforms. Therefore, the OSGi defines an open
platform for user can download applications from
remote care center and install and execute
automatically in any time. We hope that through this
open platform, developed by different vendors of
software and equipment services can communicate
and use with each other. Functional module has User
Interface and four bundles, the four bundles are
mapping three scenario. The mapping schematic was
displayed in Fig 4. There are three scenarios
developed in this work as follows.
Numeric Results
There front view the hardware platform, Intel
Xscale 270 to develop healthcare homebox is shown
in Fig. 5. There are a mother board and a TFT-LCD
in this Xscale platform. Based on this platform, the
developed system software, application software and
user-interface software are ported and integrated to
realize the three scenarios as described as mentioned.
The front-view of the user-interface is depicted in Fig.
6. Instead of keyboard, a GUI interface with touch
panel is employed for users. Such a friendly design is
more valuable and feasible to home users.
Certainly, the system performance is also
evaluated to verify the improvement of java virtual
machine. There are two key performance index (KPI)
chosen to evaluate the performance, including
starting time and memory utilization of JVMs. Three
kinds of JVM technology are compared. They are
Kaffe, JamVM and embedded J2SE. The comparison
results are shown in Fig. 7. Obviously, the adopted
JamVM demonstrates the better execution than Kaffe.
The performance of JamVM is quietly close to
typical embedded J2SE. Although JamVM and Kaffe
all run JAVA program as interpreter, JamVM’s
interpreter is highly optimized. In Section3, we talk
about the JamVM incorporating many state-of-the-art
direct-threading. These techniques let the JamVM
has high performance interpreter so it can start
OSCAR and load OSCAR bundle profile quickly.
However, the program size of embedded J2SE
is larger than the JamVM. As shown in Table 2.
JamVM’s program size is only 15.2 in run time.
JamVM and Kaffe all use the Classpath to its JAVA
library. So JamVM’s program size is close to Kaffe.
Scenario 1: Remote monitoring
Remote user can access the physiological
information record in MON platform or Get the
emergency message from Alert Key Bundle through
Web Server Bundle. In addition, User also can use
Web Server Bundle to remote install, start, stop,
upgrade bundle.
In this paper, we propose a model-oriented
nursing system, called MON system to enactive
home health care system and to satisfy the
requirements for the next generation of home health
care system. MON system cooperating with remote
care centers plays an important role to realize a smart
home with health-care applications. Through MON
system, patient enjoys medical information services
and on-line interaction with staffs in care center. Care
Scenario 2: Emergency call
When emergency, user can press the emergency
button on the User Interface (in the title button). In
this time, Alert Key Bundle will be start and send out
an emergency signal through Web Server Bundle to
notify remote care center.
center has a continuous monitoring of medical
measurements for each home patient. The
experimental results depict that MON effectively
enhances the nursing quality of home patient through
information and networking technologies. Besides,
the performances of deployment are also evaluated.
The interaction between the patient and the service
center is the key advantage of the proposed system
and also is the trend. The proposed MON
demonstrates a feasible approach to enhance the
home healthcare service to meet the requirements of
the aged people and the coming ageing society. In
particular, this MON platform achieve remote
operation, maintain and administration (OAM) based
on OSGi Standard, such as including software
module (bundle) remote install, update and control.
This is an innovation and revolution feature for
remote health care. And this feature make remote
health care more flexible and immediately.
Transaction on Information Technology in
Biomedicine, vol. 7, no. 2, (2003) 101-107
[9] F. Magrabi, N. H. Lovell, and B. G. Celler, "Web
based longitudinal ECG monitoring," Proc. 20th
Annu. Int. Conf. IEEE EMBS, vol. 20, no.3,
(1998) 1155-1158
[10] S. Park et al., "Real-time monitoring of patient
on remote sites," Proc. 20th Annu. Int. Conf.
IEEE EMBS, vol. 20, no. 3, (1998) 1321-1325
[11] B. Yang, S. Rhee, and H. H. Asada, "A
twenty-four hour tele-nursing system using a
ring sensor," Proc. 1998 IEEE Int. Conf.
Robotics Automation, (1998) 387-392
[12] Yonghong Zhang, Jing Bai and Wen Lingfeng,
"Development of a home ECG and blood
pressure telemonitoring center," Proc. 22 Annu.
Int. Conf. IEEE EMBS, (2000) 23-28
[13] Horng, Mong-Fong and Kuo, Yau-Hwang," A
rate control scheme to support isochronous
delivery in wireless CDMA link by using state
feedback technique," Proc. of IEEE 6th
(ICACT2004), vol. 1, pp. 361-366, Korea, Jan.
[14] Chien-Chung Su, Wei-Nung Lee, Mong-Fong,
Horng, Jeng-Pong Hsu and Yau-Hwang Kuo,
"Service Level Agreement: A New Bandwidth
Guarantee of Flow-level Granularity in Internet
VPN, " Proceeding of IEEE 6th International
Conference on Advanced Communication
Technology (ICACT 2005), vol. 1, pp.324- 329,
Korea, Feb. 2005.
[15] W. S. Hwang and P. C. Tseng, 2005, "A
Bandwidth Management," IEEE Transactions on
Consumer Electronics, vol. 51, no.3, pp.
840-848 Aug. 2005
[16] Intel PXA270 processor,
[17] OpenJDK,
[18] GNU Classpath,
[19] JamVM,
[20] OSGi Alliance,
[21] OSCAR,
[22] Felix,
This paper is based partially on work supported
by the National Science Council (NSC) of Taiwan,
R.O.C., under grant No. NSC97-2218-E-006-014 and
Institute for Information Industry of Taiwan, R.O.C..
[1] A SenCARE & industry backgrounder 2009SenCARE,
[2] M. F. Horng et. al ,”Development of Intelligent
Home Health-Care Box Connecting Medical
Equipments and Its Service Platform,” Proc. of
IEEE 9th International Conference on Advanced
Communication Technology, (ICACT2007),
CR-ROM, Korea, 2007.
[3] C. Wong and K. L. Chan, "Development of a
portable multi-functional patient monitor," Proc.
of the 22th Annual EMBS Int'l Conf. vol. 4,
pp.2611-2614, 2000.
[4] N. Noury et al., "Monitoring behavior in home
using a smart fall sensor and position sensors,"
Proc. of the 1st Annual International Conference
On Microtechnologies in Medicine and Biology,
(2000) 160-164
[5] K. Doughty, "DIANA-a telecare system for
community," Proc. of the 20th Annual EMBS
Int'l Conf. vol. 4, (1998) 1980-1983
[6] AmericanTeleCare,
[7] N. Lovel, et al.,"Web-based Acquisition, Storage,
and Retrieval of Biomedical Signals", IEEE
Engineering in Medicine and Biology Magazine,
vol. 20, no. 3, (2001) 38-44
[8] Kevin
"Implementation of a WAP-based telemedicine
system for patient monitoring," IEEE
Fig.1 Market Scale of the Homecare Industry
Source: Department of Industrial Technology,
MOEA, 2008/04
OSGi Framework
Embedded JVM
Frame buffer
Touch panel
Fig. 6 User interface appearance
Fig. 2 System architecture of MON
˶˸ ˋ
Fig. 7 Performance comparison of various JVM
modules in Healthcare home box.
Table 1 Intel Xscale 270-S specifications
Intel Xscale PXA270 520MHz
LCD Monitor: Sharp 3.5 "TFT
320 * 240
Touch panel: 3.5 "four-wire touch
LCD, UCB1400BE control
Serial: 2 RS232 interface, 1
full-function serial
USB HOST: 1 Host interface
USB CLIENT: 1 Client interface
LED Lamp: 8 LED lights
Fig. 3 The lifecycle of bundle
Table 2 Measurements of system performance
Fig. 4 The scenario and bundles mapping
Size (MB)
Fig. 5 MON platform
Embedded J2SE
An Approach for Tagging 3D Worlds for the Net
Fabio Pittarello
Università Ca’ Foscari di Venezia
Dipartimento di Informatica
Via Torino 155, 30172 Mestre (VE), Italy
[email protected]
Abstract—Free-tagging is one of the leading mechanisms
characterizing the so-called web 2.0, enabling users to define
collaboratively the meaning of web data for improving their
findability. Tagging is applied in several forms to hypermedia
data, that represent the largest part of the information on the
In spite of that, there is a growing part of web data made
of 3D vectors representing real objects such as trees, houses
and people that lacks any semantic definition. Such situation
prevents any advanced use of the data contained inside these
3D worlds, including seeking, filtering and manipulation of the
objects represented by vectors.
This work proposes a bottom-up approach for adding semantics to such data, based on the collaborative effort of users
navigating the web. The paper, after describing the similarities
and the differences that characterize tagging for hypermedia
and interactive 3D worlds, discusses the design choices that have
guided the definition of a specification for inserting tags in the
latter environments.
The approach permits the annotation of most 3D worlds compliant with X3D, the ISO standard for describing 3D interactive
worlds for the net.
comparison of different 3D world based on the analysis
of the similarity of labels;
• automatic presentation of high-level information to the
users navigating the 3D environment, associated to the
location and to the objects they are currently browsing;
• extraction of semantic objects for examination or for
automatically creation of high-level repositories (e.g., a
repository of trees extracted from different 3D worlds).
In the last few years there have been a number of proposals
for adding semantic information to 3D worlds.
Most proposals are characterized by a top-down approach:
low level geometric objects are associated to instances of
high-level classes, belonging to predefined domain ontologies
(e.g., the kitchen wall, belonging to the class wall). In these
proposals the annotation process is constrained because the
user may use only one of the available classes (e.g., the class
wall) and relations (e.g., the containment relation).
This work proposes a different complementary approach
based on the free selection and annotation of geometric
objects. While this process, widely diffused in the hypertextual
web and defined as tagging, is characterized by informational
fuzziness, it gives a powerful opportunity of labelling objects
according to different points of views and lets the high-level
semantics of the tagged objects gradually and dynamically
Altough this work shares with the hypermedia tagging the
general concepts and practices, there are some differences and
additional issues deriving from the specific application to 3D
In particular:
• the information objects available for tagging are not
clearly identified from the start, as it happens for hypermedia tagging;
• the 3D scene may be populated by vectors characterized
by different levels of granularity that can’t always be
associated to a specific high-level meaning; therefore
they may require a preliminary grouping operation before
assigning a tag to them; the different grouping choices
that might be operated by the users represent an additional
variable that adds a level of complexity to the tagging
• altough some standard information structures for presenting and navigating the result of the tagging activity
may be derived from hypermedia (e.g., the so-called tag
cloud), 3D worlds may benefit from different presentation
Because of the availability of faster graphics cards and
broader communication networks, the number of 3D worlds
for the net is gradually increasing. The application domains
are different, ranging from urban studies and tourism to social
networking. In most cases the modeling of 3D environments
and objects is based on low-level geometric elements like
polygonal meshes or, for the most advanced environments,
on objects belonging to the family of NURBS surfaces. The
authors of 3D worlds implicitly associate a semantics that is
recognized by the visitors of the 3D environments; a successful
outcome of this process is granted both by the skill of the
author and by the existence of a common cultural background
shared between the author and the visitor of the 3D world.
Unfortunately, no high-level information related to the objects represented by the polygonal meshes or to their relations
is usually available in the files where the 3D information is
The lack of any high-level annotation for the components of
these environments prevents any use different from the direct
visualization and interaction with the single 3D world. A range
of possible interesting uses of such information includes:
• indexing of high-level information by search engines;
such information may then be used for seeking different
3D worlds, basing the process on the indexed labels;
techniques, for avoiding presentation clutter; for example,
tags to present may be filtered according to the current
position and orientation of the user avatar inside the 3D
The rest of the work is organized as follows: Section 2
will consider related works, with a particular reference to the
semantic description of 3D environments and to hypermedia
tagging; Section 3 will compare tagging for hypermedia and
interactive 3D worlds; Section 4 will describe the goals and
the design choices of this proposal; Section 5 will show how
tags may be included in a standard X3D file for describing
objects and spaces; Section 6 will conclude the paper, giving
some hints for future development.
The technique of free tagging, typical of the so-called web
2.0, permits to final users to annotate documents, giving birth
to new structures for organizing information. While these
structures, called folksonomies [13], suffer from drawbacks
such as homonymy, synonymy and polysemy that are endemic
to the bottom-up bulding process, they offer an additional
opportunity to label information, standing from the user point
of view.
If, according to Boiko [14], content can be defined as the
sum of data and associated metadata, we may say that the
application of user-specific metadata to data generates multiple
contents, derived from the interpretation of data given by
different users.
The bottom-up approach is opposed to the classic top-down
approach in which a designer defines the information structure
of the site [15]. Both approaches have specific points of
strength and suffer from drawbacks. That is the reason why
some authors have proposed different forms of integration,
for composing the need for rigorous classification, increased
expressivity and improved findability. While some of the experiences reported in literature are targeted at deriving ontologies
from folksonomies [16] [17], other approaches go towards the
integration of top-down and bottom-up structures, originating
two complementary systems for navigating information [18].
Research related to the semantic annotation of multimedia
documents has become increasingly important in the last few
years. In the context of the audio-video domain, the Moving
Picture Experts Group (MPEG) [1] has defined a set of
standards for coding and describing such data. The most
interesting standards in relation to this work are MPEG-4
[2] and MPEG-7 [3] [4]. The first specification defines
a multimedia document as the sum of different objects and
includes an XML based format containing a subset of X3D
[5], the ISO standard for describing 3D worlds for the web.
The latter specification permits to describe multimedia content
of different nature (e.g., MPEG-4, SVG, etc.).
Some interesting proposals [6] [7] [8] use the MPEG-7
standard for annotating the semantics of a 3D scene. Halabala
[7] uses MPEG-7 to store scene-dependant semantic graphs
related to a 3D environment. Also Mansouri [8] uses MPEG7 for describing the semantics of virtual worlds. The feature
is introduced for enhancing queries and navigation inside 3D
environments (e.g., the system can return virtual worlds after
semantic queries such as I am looking for a big chair).
Concerning the web, the World Wide Web Consortium
promotes the definition of a set of languages, rules and tools
for high-level description of information. The semantic web is
composed by different layers, where the lower one is occupied
by the data themselves (expressed in XML) and the higher
ones describe - through the introduction of languages such as
RDF (Resource Description Framework) [9] and OWL (Web
Ontology Language) [10] - the semantic properties of such
data. Pittarello et al. [11] propose to integrate such languages
in a scene-independent approach for annotating 3D scenes.
In this approach the X3D language is used for describing the
geometric properties of 3D environments and their associations
with high-level semantics, while RDF and OWL are used for
defining the scene-independent domain ontology.
The annotation process proposed in [11] includes not only
the geometric objects defined into the scene, but also the
spaces generated by these objects and inhabited by (virtual)
humans. The approach stems from a previous research work
[12] aimed at labelling in a multimodal way the environment
spatial locations, in order to enhance the user orientation and
navigation inside of them.
This paper, inspired by the work done in the hypermedia
domain, suggests to use tagging as a means to let content
emerge from the raw 3D data. As stated in the beginning of
this work, the application of semantic labels is particularly
relevant in a situation characterized - in the most part of the
cases - by the lack of any high-level information.
The lack of this information prevents any use of the data
different from what has been conceived by the world author.
In most cases the use is related to the simple visualization or
interaction inside a specific 3D world.
This situation represents a serious drawback, if compared
with what happens in the hypermedia web, where the searching
and navigation possibilities rely not only on the structures
designed by the information architects of the specific sites,
but also on the indexing activity of web crawlers and on the
classification activity made by users through tagging.
The possibility to search and browse across a network of
different web sites is one of the peculiar features of the web
and one of the reasons of its successful affirmation. In contrast,
most of the 3D worlds available on the net are separate islands
that can’t be cross searched, filtered or compared.
Tagging may represent an opportunity for letting information emerge from the raw representation and for building
powerful cross-world searching and navigation systems.
This work suggests to use tagging, applied to 3D worlds, for
all those situations where an ontology for specific domains is
not available or where the existing ontology may be profitably
used only by skilled users. For example, an ontology targeted
at classic architecture may be profitably used only by subjects
Fig. 2.
Different styles for grouping and tagging objects
operation) that may not take into account the object semantics.
In such cases it may be necessary to group or split objects, as
a preliminary operation for associating a meaning to them.
Fig. 1.
One of the goals of our proposal is to give the user the possibility to apply the labels that identify the semantic properties
of the objects with the maximum freedom. As stated before,
the 3D domain is characterized by different classes of objects
that may be tagged and by different levels of granularity. We
decided to treat this situation as an additional opportunity
for tagging. According to this choice, in our proposal all the
geometric objects belonging to the 3D world are taggable.
Users are also enabled to define new groups of objects and
associate tags to them (e.g., the user may decide to tag the
single components of the chair defined in Fig.1 and then to
define a group where to put all the components and tag it as
Of course we are conscious that different users may decide
to define and tag overlapping groups of objects, as can be seen
in Fig.2. In this example two users apply different styles for
tagging the objects of a room. The first user applies the tags
chairs and tables - evidenced in light gray - after grouping the
objects belonging to the same category of furniture; the latter
one groups and tags the objects in relation to the owner (i.e.,
john’s furniture and mary’s furniture).
Different tagging styles may represent an issue for the
progressive building of the world semantics, introducing a
significant amount of informational noise.
On the other side, the opposite choice of forbidding overlapping groups may support informational convergence, but
may present additional problems. For example, the association
of tags to groups of objects may be restricted to groups of
the scene graph or defined by previous users, forbidding the
creation of groups that use only a part of the components
of existing groups. Unfortunately, following this methodology,
inaccurate grouping choices made by previous users can’t be
further modified. The process may push the tagging activity
towards the wrong direction, originating bad semantic associations, such as groups lacking a part of the semantically
relevant components.
Some techniques, such as the simple suggestion of the
groups already defined by other users, may be an acceptable
compromise for reducing the informational noise and support-
Different styles for defining the components of a chair
that are aware of the meaning of terms such as capital, triglyp
or entablature. Users that are not trained in the architecture
domain might be unable to use such technical terms and they
might still want to classify the available information with their
own words.
The goal of the proposal is to preserve the same freedom
of tagging that is typical of hypermedia tagging systems. For
reaching such goal, there are a number of difficulties that are
typical of 3D worlds. For hypermedia, the class of objects that
may be tagged (e.g., a web page, a video or a photograph)
is clearly identified during the design phase and all the tags
defined by users will be associated to instances of this class.
Besides, during the tagging phase, the targets can be clearly
identified. Users don’t have to select them among other types
of objects, but just specify tags.
That is not true for 3D worlds, where the raw data represent
objects belonging to different classes and have different levels
of granularity. The modeling process may lead, for example, to
use a single mesh for representing a chair or - alternatively - to
use different meshes for defining the legs and the seat. Meshes,
during the modeling phase, may be grouped, depending on the
author habits.
Additionally, the 3D modeling practices may lead to create
geometrical objects that don’t have a semantics, if considered
separately. Fig.1 displays the object chair modeled with a
different number of components. For what concerns the model
on the left, all the components have a semantics that can be
easily identified (e.g., the legs, the seat, etc.). The model on
the right is characterized by two components, labeled with
5a and 5b, that don’t have a specific semantics, being only
subsets of a leg. Generally speaking, this situation may derive
from the fact that a specific set of meshes has been modeled
only for obtaining a result in terms of visual presentation
rather than keeping in mind the direct association with an
high-level meaning. Besides, in some situations, meshes may
derive also from some automatic process (e.g., a 3D scanning
Fig. 3.
Tagging a chair using a narrow and a broad folksonomy
ing convergence towards a meaningful semantics. The issue
will be further considered in the ongoing development of the
project, where the users will experiment a prototype interface
- under development - for tagging and their effort will be
Concerning the accumulation of the tagging activity done
by different users, the system may permit to store only one
instance of a specific tag for a given object - as it happens
in Flickr, the well-known web application for sharing photographs on the net - or also the number of occurrences. The
structures derived from the latter approach are named broad
folksonomies. They are opposed to the narrow folksonomies,
that characterize the first approach, and - as explained in [19]
- permit a better understanding of the terms that are more used
by people for classifying objects.
Both approaches may be used for the 3D domain, as shown
in Fig.3, where the chair on the left is tagged with a narrow
folksonomy, while the same object - on the right - is tagged
with a broad folksonomy. In both cases the system may
preserve also the identity of the user tagging the object. Such
additional information may enable additional processing, such
as the extraction of tags assigned - for a given 3D world by a single user or by a subset of users corresponding - for
example - to a specific category.
Our design choice is to permit the accumulation of the
instances for a given tag. Of course, a restricted folksonomy
may be easily derived from the resulting broad folksonomy.
Another goal of our proposal is to maximize the number
of existing 3D worlds on the net that may be enhanced
with a semantic description. A parallel structure for storing
semantic information is defined, that doesn’t modify the
existing relations stored in the scene graph. Such approach
permits to enhance existing worlds, minimizing harms to the
visualization and the interactivity that characterize the original
3D environments. The following section will show how the
specification defined for tagging permits - taking advantage of
the X3D standard - to reach such goal.
Fig. 4.
Geometric and semantic objects
are part of an X3D world - including the geometric objects
- are described through nodes - that can be nested - and
fields - where the properties of the objects may be stored.
X3D represents the evolution of VRML97 and adds to it the
capability to insert specific nodes for metadata, to specify
information related to the objects of the 3D world.
Unfortunately, the X3D standard doesn’t suggest how to
take advantage of metadata nodes for defining structured
semantic information inside 3D worlds.
In a previous work [11], the author suggested an approach
for specifying high-level information for 3D worlds, using
these nodes and an associated scene-independent domain
ontology. In this work X3D metadata are used as the basis
for associating tags to geometrical objects. This bottom-up
approach is complementary to the previous one and is designed
to be merged with it.
In the previous work we considered the concept of geometric object as opposed to the concepts of real and virtual
semantic objects. The first category represents the raw information that may be found in any 3D file. It may be a single
geometric shape or a group of geometric objects.
We coined the concept of real semantic object for all the
cases where it is possible to associate an high-level meaning
to a geometrical object.
Unfortunately, such association - as discussed in the previous sections - can’t be always be found. There may be
cases where geometric objects or groups defined in the scene
graph can’t be directly associated to a specific meaning,
or such association doesn’t make sense (e.g., many small
objects - such as the stones displayed in Fig.4 - don’t need
a specific reference for each object, but they may collectively
We chose X3D as the target language for our methodology.
X3D [5] is a widely diffused language for representing
interactive geometric objects for the net. All the objects that
<Shape DEF=’chair0123’>
<MetadataSet name="folksonomy"
<MetadataSet name="tagslist" reference="">
value="’0004’ ’my_chair’"/>
value="’0002’ ’wooden_chair’"/>
value="’0001’ ’chair’"/>
<MetadataSet name="grouping" reference="">
<MetadataString name="furniture235"/>
associated to a single label). Besides, there may be also the
need to introduce higher-level semantic groupings for adding
expressivity to the scene description.
For all those situations we defined the concept of virtual
semantic object, a labelled container that collects a set of
geometrical objects, lower-level semantic objects or even a
mix of those entities.
In this work we take advantage of the same definitions. In
spite of that, we propose a different complementary structure
for metadata, suitable to the tagging needs, for giving the
possibility to have different labels for the same object and
for defining all the high-level semantics inside the same X3D
file that describes the geometry.
Fig.4 show a sample of objects belonging to the three
categories discussed above. Single geometric objects are evidenced through their geometric shapes. In some cases the
geometric objects have been grouped by the world designer
and this information - stored in the scene graph - has been
evidenced with the circle labelled group. Real semantic objects
are associated to geometric objects, single or grouped. Some
of them are characterized by single tags (i.e., legs and top).
The real semantic object associated to the shape that identifies
the chair is characterized by a set of tags (chair, my chair and
wooden chair), assigned by different users.
Virtual semantic objects, tagged as table, furniture and
pebbles, have been specified where it has not been possible
to use the existing shapes or grouping nodes of the scene
graph for storing high-level information. These new objects are
therefore introduced for completing the semantic description
of the 3D world.
The code example displayed in Fig.5 shows the definition
of a real semantic object, associated to the geometric shape
defining the chair of Fig.4. A set of nested MetadataSet and
MetadataString nodes are used for defining a metadata section
inside the existing geometrical shape, chair0123.
All the tags and the number of occurrences for each tag
are stored as a set of MetadataString nodes, nested inside
a MetadataSet named tagslist. Another MetadataSet, named
grouping, is used to contain the references to higher-level
virtual objects; in this example the geometrical object is
semantically associated - through the nested MetadataString
node - to the virtual object named furniture235, tagged as
furniture in Fig.4.
3D worlds are not made only of objects, but objects generate
spaces that are inhabited by (virtual) humans. Such spaces may
be proficiently labeled. That is the reason why - in coherence
with what we did for the previous top-down proposal - in this
work we extended the possibility to use tags also for spaces.
The X3D object that we currently use for associating tags
to space is the ProximitySensor node, an invisible node that
is used for monitoring the user action inside the 3D worlds.
Proximity nodes may be used to define a set of locations and
may also be nested for defining a hierarchy of spaces.
The code fragment displayed in Fig.6 shows how to associate tags to a proximity sensor available in the X3D scene.
The structure of metadata nodes is similar to that one displayed
Fig. 5.
A real semantic object tagged with three labels.
in the previous example. Also in this case different tags (i.e.,
my room, sitting room and small room) have been used for
classifying the same object. Because no higher-level space has
been defined in the example, the MetadataSet node named
grouping doesn’t contain any MetadataString node.
<ProximitySensor DEF=’room457’>
<MetadataSet name="folksonomy"
<MetadataSet name="tagslist" reference="">
value="’0004’ ’my_room’"/>
value="’0003’ ’sitting_room’"/>
value="’0001’ ’small_room’"/>
<MetadataSet name="grouping" reference="">
Fig. 6.
A space tagged with three different labels.
The code given in Fig.7 illustrates how to define and
tag a virtual semantic object starting from real geometric
objects. The geometric objects are the components of the table
presented in Fig.4. The virtual semantic object tagged as table
is based on two different real semantic objects, defining the
legs and the top of the table. Each real semantic object has a
structure similar to that one described in Fig.5 and is linked to
the virtual semantic object through the MetadataString nodes
named table457 (i.e., the identifier of the virtual object).
The virtual semantic object is defined as a set of MetadataSet and MetadataString, whose structure reflects that one
adopted for real semantic objects. In spite of that, while
the latter objects are defined inside existing geometrical and
grouping nodes, the information related to virtual objects can’t
be referred to any existing node belonging to these categories.
For achieving our goal, we specify a section inside the
WorldInfo node, a standard X3D node used for giving a
description of the content of a specific world. Each virtual
semantic object - like the virtual object table457, tagged with
the label table - is defined as a MetadataSet node, nested into
the main MetadataSet named virtual objects. The code shows
also an additional relation of the virtual object table457 with
an higher-level virtual object, furniture235, not represented in
the example.
ontology-based labelling, described in a previous work.
Currently the navigation and interaction potential of most
3D worlds is limited to what has been designed by the world
author. Additional possibilities, such as advanced searching
and filtering, may emerge from the availability of high-level
information associated to the raw data. The use of a a widely
diffused file format for the 3D web, X3D, and the specification
of a unified methodology for tagging the components of the
different worlds may extend these opportunities to a consistent
number of 3D worlds deployed on the web, enabling crossworld searching, filtering and extraction of objects.
Ongoing work is focused on the implementation of a prototypical interface for verifying the design choices and receiving
hints for future development.
<MetadataSet name="virtual_objects">
<MetadataSet DEF=’table457’>
<MetadataSet name="folksonomy"
<MetadataSet name="tagslist" reference="">
value="’0001’ ’table’"/>
<MetadataSet name="grouping" reference="">
<MetadataString name="furniture235" />
<Shape DEF=’top0129’>
<MetadataSet name="folksonomy"
<MetadataSet name="tagslist" reference="">
value="’0001’ ’top’"/>
<MetadataSet name="grouping" reference="">
<MetadataString name="table457"/>
<Group DEF=’legs234’>
<MetadataSet name="folksonomy"
<MetadataSet name="tagslist" reference="">
value="’0001’ ’legs’"/>
<MetadataSet name="grouping" reference="">
<MetadataString name="table457"/>
Fig. 7.
[1] “MPEG Homepage,”
[2] F. Pereira and T. Ebrahimi, The MPEG-4 Book. Prentice-Hall, 2002.
[3] F. Nack and A. T. Lindsay, “Everything you wanted to know about
mpeg-7 - part 1,” IEEE Multimedia, vol. 6, no. 3, pp. 65–77, 1999.
[4] ——, “Everything you wanted to know about mpeg-7 - part 2,” IEEE
Multimedia, vol. 6, no. 4, pp. 64–73, 1999.
[5] X3D, “Extensible 3D (X3D) architecture and base components edition 2 ISO/IEC IS 19775-1.2:2008,”
specifications/ISO-IEC-19775-1.2-X3D-AbstractSpecification/, 2008.
[6] I. M. Bilasco, J. Gensel, M. Villanova-Oliver, and H. Martin, “On
indexing of 3D scenes using MPEG-7,” in Proceedings of the 13th
Annual ACM International Conference on Multimedia. ACM Press,
2005, pp. 471–474.
[7] P. Halabala, “Semantic metadata creation,” in Proceedings of CESCG
2003: 7th Central European Seminar on Computer Graphics, 2003, pp.
[8] H. Mansouri, “Using semantic descriptions for building and querying
virtual environments,” Ph.D. dissertation, Vrije Universiteit Brussel,
[9] RDF, “RDF Primer W3C Recommendation,”
rdf-primer/, 2004.
[10] OWL, “Web Ontology Language Guide,”
owl-guide/, 2004.
[11] F. Pittarello and A. De Faveri, “Semantic description of 3D environments: a proposal based on web standards,” in Proceedings of Web3D,
11th International Symposium on 3D Web. ACM Press, New York,
[12] F. Pittarello, “Accessing information through multimodal 3d environments: towards universal access,” Universal Access in the Information
Society, vol. 2, no. 2, pp. 189–204, 2003.
[13] T. Vander Wal, “Folksonomy,”,
[14] B. Boiko, Content management bible. Wiley Publishing, 2004.
[15] L. Rosenfeld and P. Morville, Information Architecture for the World
Wide Web. O’Reilly, 2006.
[16] P. Spyns, A. de Moor, J. Vandenbussche, and R. Meersman, “From
folksologies to ontologies: How the twain meet,” in Proceedings of
On the Move to Meaningful Internet Systems, ser. Lecture Notes in
Computer Science, vol. 4275. Springer, 2006, pp. 738–755.
[17] C. Van Damme, M. Hepp, and K. Siorpaes, “Folksontology: An integrated approach to turning folksonomies into ontologies,” in Proceedings
of the ESWC Workshop Bridging the Gap between Semantic Web and
Web 2.0. ACM Press, New York, 2007, pp. 57–70.
[18] F. Carcillo and L. Rosati, “Tags for citizens: Integrating top-down
and bottom-up classification in the turin municipality website,” in
Proceedings of Online Communities and Social Computing: Second
International Conference at HCI International, ser. Lecture Notes in
Computer Science, vol. 4564. Springer, 2007, pp. 256–264.
[19] T. Vander Wal, “Explaining and showing broad and narrow
explaining and .html, 2005.
Virtual and real semantic objects
In this paper we have presented the results of an ongoing
research activity targeted at tagging 3D worlds available on
the net. The final goal of this research is to enhance low-level
3D information with semantic labels, for a full exploitation of
3D information available on the web.
In this work we focused on bottom-up folksonomic tagging,
suggested as an approach complementary to the top-down
Andrea De Lucia, Rita Francese, Ignazio Passero and Genoveffa Tortora
Dipatimento di Matematica e Informatica, Università degli Studi di Salerno,via Ponte don Melillo 1,Fisciano (SA),Italy
[email protected], [email protected], [email protected], [email protected]
worlds offer a multi-tiered communication platform to
collaborate and do business which provides the
perception of awareness and presence that cannot be
reached with e-mail, conference calls or other
Linden Lab has created its digital currency for
online exchange of goods and services: all processes
of payment are virtualized and can be managed using
the solutions offered by Second Life. At the present,
several commercial organization are exploring SL
world to support e-commerce. American Apparel,
Adidas, Lacoste, Reedbok and Armani, for example,
have opened virtual shops on Second Life. Indeed, 3D
representations are suitable for electronic commerce
because they emulate the shopping layout and many
shopping items such as furniture, dresses, accessories,
and so on. Exploiting the opportunity offered by the
available virtual worlds enables organizations to
obtain a simple setup that can be created with a
reduced cost and can be accessed by a large number of
people, without using specific input devices. In
addition, during real life shopping customers often
consult with each other about products. A multi-user
environment offers to the users the possibility of
collaborating while shopping, benefiting from each
other experiences and opinions [9] [10].
In this paper we present the result of an ongoing
project, the TA-CAMP project, which aims at
providing the textile consortium of the Campania
Region (Italy) with several services and, in particular,
focusing on e-commerce and internationalization
aspects. The TA-CAMP project offers a traditional
web site to promote virtual expositions and also offers
an enhanced version of this service on Second Life,
named TA-CAMP Life.
In this direction we have proposed to the textile
organizations of the Italian Campania Region a virtual
Virtual Worlds are being ever more adopted for global
commerce and in the future will be used in fields
including retail, client services, B2B and advertising.
The main advantage is the support provided to the
user community in communicating while shopping.
This paper describes a project aiming at providing
virtual exhibitions on Second Life: the TA-CAMP Life
virtual expo system, which is the result of the
integration between a web virtual expo and its
extension on Second Life. The back-end web-based
system supports the generation of an exhibition on
Second Life and organizes the expo by distributing the
exhibitors’ stands on an island and enabling each
stand to dynamically expose multimedia contents.
Second Life, virtual world, virtual expo,
system integration, e-commerce.
Enterprise marketing and external exchanges have
new opportunities due to the development of network
technologies. At the present, there is a growing interest
towards 3D worlds which, thanks to the technological
evolution, become more and more promising. Indeed,
several worldwide organizations such as IBM and
Linden are investing in this area.
In particular, Linden Lab proposes Second Life
(SL) [8], the most popular Internet based Virtual
World, if measured by number of subscribers and
money exchange [4]. In Second Life, as well as in
other Virtual World platforms, it is possible to interact
with the other users, represented by avatars, through
voice chat, text chat, and instant messaging. Virtual
and sold more than $360m worth of virtual goods and
services in 2008 [6].
Users are represented by avatars and interact with
the environment controlling the avatar actions. Second
Life enables to use web, video, audio streaming,
VOIP. People can privately as well as publicly chat on
an open channel.
As investigated in [1], this movement in Second
Life occurs in a natural manner and the user is able to
control the events, he/she sees his/her avatar behaving
as expected and the 3D world changing accordingly to
his/her commands. Animations and gestures are
offered to augment face to face communication. Once
in the environment, people have a first person
perspective, they participate, do not only watch.
Situational awareness, “who is there”, is well
supported, as well as awareness on “what is going”.
Moreover, the user perception of awareness, presence
and communication inducted by the environment are
in general very positive [1].
SL offers the possibility to connect with external
web-pages and internet resources.
expo enabling the textile consortium to reach a wide
user community. In this way the marketing message is
promoted in the traditional web world and also in an
international economic context in continuous growing.
The virtual exhibition on SL is automatically
generated creating a reception areas and several stands
starting from the information available on the web
version of the exhibition.
The rest of the paper is organized as follows:
Section 2 introduces the main features of Second Life,
while Section 3 describes how we have organized the
exhibition on Second Life. Section 4 presents the
architecture and the interaction modality offered by
TA-CAMP life. Finally, Section 5 concludes.
Several work have been proposed in literature
aiming at supporting collaborative shopping, see [10]
as an example. They often does not address scalability
issues [9]. Peer-to-peer networked shopping CVEs
have also been investigated in [9]. In this work peerto-peer is preferred to virtual world based solution
basing on the consideration that a group of buyer
should contain at most dozen people in the same group
at a given time. A virtual expo should host at the same
time many avatars, distributed among the various
stands and which can also be grouped all together
during specific events, such as discussion meetings or
awarding of prizes, scheduled by the expo
We decided to select Second Life to host the
virtual expo features of the TA-CAMP project for
several reasons, summarized as follows.
SL offers a tridimensional and persistent virtual
world created by its “residents” in which is embedded
a real economic system supporting the exchange of
virtual goods and services.
SL hosts a community of over ten millions of users
and concurrently, at each hour of the day, are on line
about 100,000 users. Avatars can build and sell things,
such as clothing or airplanes, and these transactions
can be paid using the Linden dollar currency, which
makes economic activities in the digital space directly
connected to the earth-based economy.
It is important to point out that Linden Lab has
estimated that the global market for virtual goods in a
$1.5bn a year, and that Second Life residents bought
4900 sqm
4900 sqm
4900 sqm
4900 sqm
4900 sqm
4900 sqm
4900 sqm
5055 sqm
Figure 1. Space organization of the TA-CAMP Life
In addition, IBM and Linden Lab are developing
together an in-house version of Second Life for
businesses enabling enterprises to build secure virtual
worlds that can be deployed behind a firewall [6]
collects the data concerning the visitors and shows the
number of participants.
The gadget distributor has been designed in such a
way to collect the user preferences concerning models,
colors and dress-material. The users answer to an
implicit survey while customizing their gadget. A
prototype of the Survey/Gadget Distributor is shown
in Figure 3, where the user has chosen as a gadget a
pair of trousers and is selecting their color and their
material. The gadget is also decorated with the expo
logo. Once terminated the gadget customization the
user can wear or store it in his/her inventory.
The areas next to the reception are available to
support exhibition events, such as presentations with a
slide projector, as described in [3].
Users access to SL using a client software that can
be downloaded for free and is available for multiple
platforms. Linden Lab, maintains a network cluster to
host regions of 3D virtual environments, the “islands”.
These islands contain user-created 3D content and can
be interactively explored by the users that are logged
into the system of SL. The content in SL is protected
by a digital rights management system.
SL is based on the archipelago metaphor, where
space is organized in islands, which are connected
each other via teleportation links, bridges, and roads.
The island hosting the TA-CAMP life virtual expo has
been designed as shown in Figure 1 and it is composed
of a reception area and several stands.
The island has been designed in such a way to
favor both the direct access to a stand of specific
interest and to create a continuous path organized in
pavilions. Pavilions are referred to a specific
marketable goods and each one contains ten stands.
Second Life enables to use web, video, audio
streaming. People can privately as well as publicly
communicate using Second Life chat or VOIP,
collaborating while shopping.
3.2 Stand Organization
Concerning the exposition stands, they are multipart objects composed by at least 400 prims (the single
building blocks of each Second Life object). Each
island offers a limited number of prims, thus we have
computed that each island can hosts at most 40 stands.
3.1 The reception area
The exhibition offers a unique access area,
signaled by an arrow in Figure 1. This area represents
the reception of the exhibition where it is possible to
consult the Exhibitors’ Catalog, structured in
Figure 3.The Survey/Gadget Distributor
prototype in the Reception Area
TA-CAMP Life offers two types of stand,
depending on the availability of videos to be shown. In
particular, it is possible to adopt a stand consisting of
four areas (Presentation, Image, Video, Web) or a
three areas one (Presentation, 2xImage, Web). In each
area, in addition to the main communication channel
(web, image or video), it is also possible to have an
independent channel for audio diffusion. When several
users are in an area and discuss using the chat, the text
Figure 2.The Exhibitors’Catalog
In this environment several automatic distributors
of expo’s gadgets, such as the shirt with the expo’s
logo, are also available. A Participant Detector Object
written by the user is saved in the web site back-hand
for further analysis, provided the user permissions.
When a user accesses to a stand he/she goes in the
Presentation Area, shown in Figure 4, where a web
presentation of the firm and its products is displayed
on a screen. He/she always can go back to the
reception Hall using a teleport-like direct link.
In the Image Area it is possible to search the
product catalog using the Index Board depicted in
Figure 5 (a), to examine the detailed images of the
selected product, Figure 5 (b), and its description
displayed on a board adjacent to the image projector,
as shown in Figure 5 (c). There is also the possibility
of accessing to the front-end of the electronic
commerce web site of the organization to order the
The project requirements established that the
virtual exhibition had to be offered in both web and
Second Life modalities and that, once populated the
database of the web version, the SL exhibition had to
be automatically generated.
Users who access to the web version of the expo
using the browser are also invited on the expo in
Second Life. By clicking on a Second Life link they
are teleported in the expo areas of TA-CAMP Life
Figure 6. The Video Area
The identified actors are:
• The system administrator, managing the
virtual exhibition. This includes the definition
of the start and finish date of the exhibition,
the association of an exhibitor to a stand, the
exhibitor access right definition, etc. These
functionalities are supported by a web
• The exhibitor, managing his/her stand and the
contents to be shown.
• The customer, visiting the expo and buying
goods through the expositor e-commerce web
The web and virtual world interaction modalities
collect the needed contents inquiring a common
Content Management System (CMS), as illustrated in
Figure 7, where an overview of the TA-CAMP Life
system architecture, with the different components
distributed over several servers, is shown.
The SL Expo objects, such as projectors and Index
boards, are resident on the Second Life Linden
External Server. All these objects expose an active
behavior obtained by using the programming language
offered by SL, namely Linden Scripting Language [5].
Figure 4.The stand Presentation Area
Another area is also available to provide video
contents of advertisings or fashion-shows, for
example. Also in this case an Index Board, depicted in
the left-end part of Figure 6, enables to select the video
to show by touching the related text line.
Figure 5. The Image Area
The web area is organized in a similar way and
accesses to the expositor web site.
Communication involving the objects and the
external world has been performed using HTTP
requests/responses, while intra-object communication
relies on link or chat messages. A link message is
adopted when sender and receiver are embodied in the
same composite object. Chat messages may be
exchanged among several objects in the same island.
Different kinds of chat messages can be selected,
depending on the sender and receiver distance. In
addition, each chat message can be sent on a reserved
channel in such a way to have a unique receiver [7].
To enable customers to play video contents during
their visit, the Video Area is equipped with the
Content Index Board, which displays the catalog of the
multimedia contents associated to a specific stand.
Once selected a content, it is played on the Content
Board. Figure 8 shows the In and Out World Objects
involved to display multimedia contents on the
Content Board in Second Life. The Content Index
Board exposes two objects: the Page Button to go
forward and backward in the content list and the
Content Selector object. The Content Index Board
requires the content details to the Stand object which
enquires the Content objects. The Stand object returns
this information to the Content Index board for
displaying it. The Content Selector highlights the
index element selected by a touch action and
communicates its position to the Content Index Board,
which, in turns activates the Content Board. The latter
sends a HTTP request to the identified resources out
Resident Viewer
Web Browser
Streaming add-on plug-in
Database Node
SL Linden External Server
Second Life Logic
SL Expo objects
Figure 7. The system architecture
The virtual exhibition is dynamically generated
collecting the data offered by the traditional web site.
In particular, each SL Expo object is dynamically
populated as follows: the SL Expo object sends a
HTTP request to the CMS, specifying its stand
identifier, to obtain the appropriate content to be
displayed in Second Life. The CMS embeds the
required information in a HTTP response towards the
considered SL Expo object. This mechanism enables to
get a new 3D exhibition each time the web site starts a
new expo. An ad-hoc developed plug-in of the CMS,
named Streaming add-on plug-in, communicates with
a Darwin Streaming Server (DSS) component,
integrated into the system to provide streaming
capability to both the CMS and Second Life. It also
provides the possibility to access, in a controlled
manner, to a variety of multimedia contents from an
exhibition stand.
Multimedia Content
Out World
<<Chat Message>>
Content Index Board
<<Link Message>>
4.1 Accessing to multimedia contents from
Second Life
Content Selector
Content Board
In World
<<Chat Message>>
Page Button
<<Link Message>>
<<Link Message>>
Pause button
Play button
Figure 8. In/out world communication in the
Video Area
SL enables to show text only in terms of images.
Chats can also be used to display textual information,
but they are not suitable to show large text. Thus, to
display textual contents on the boards we adopted a
library, the XyzzyText library [13], enabling to create
special elementary prims able to display a pair of
letters on each face. By disposing these elementary
blocks on the surface of a board it is possible to show
multi-line text. As an example, to show the content list
In this sub-section we describe how TA-CAMP
Life accesses, in a controlled manner, to a variety of
multimedia contents during a visit of the Video Area
of a stand.
It is important to point out that SL technology
exposes to land owners the availability to connect each
land parcel to media content which can consist of
images, videos, audios or web pages. To exploit this
feature, the multimedia materials have to be stored on
an external server.
on the Content List Board the text to be displayed is
required to the Stand object out world and then
arranged using the XyzzyText library. An example of
board is shown in Figure 2, where the exhibitors’
catalog board capable of displaying a matrix of 10 x
40 characters is depicted.
the 2nd international conference on Semantics And
digital Media Technologies:pp. 172-184
[2]. Celentano, A., Pittarello, F., (2004), Observing and
Adapting User Behavior in Navigational 3D
Interfaces, in the proceedings of the working
conference on Advanced visual Interfaces, Gallipoli,
Italy, pp. 275 – 282.
[3]. De Lucia, A., Francese, R., Passero, I., Tortora, G.
(2009) Development and Evaluation of a Virtual
Campus on Second Life: the case of SecondDMI.
Computer & Education. Elsevier. Vol. 52, Issue 1,
In this paper we have described the main features
of the virtual exhibition components of the TA-CAMP
Life project, enabling two variants of a virtual expo,
one web-based and the other based on the Second Life
virtual world, to coexist. Using a unique database for
both the approaches a complete virtual world expo can
be automatically generated. It is important to point out
that TA-CAMP Life does not replicate the remoteness
loneliness of an exhibition. Even if it offers a product
catalog as in its web version, TA-CAMP Life also
promotes the social texture of a real exhibition, along
with the collaborative nature of buying and offers to
the exhibitors the possibility of organizing
synchronous events. The system also provides survey
features and sensors to examine the user behavior and
collect information useful to foresee marketing trends.
We plan to use this information together with other
data concerning the user behavior for anticipating the
his/her needs in forthcoming interactions, investigating
the differences between the adaptation in a multi-user
environment and similar approach proposed for singleuser environments, such as [1][2]. Future work will
also be devoted to investigate how to adopt the
functionalities offered by SL for controlling the
avatars, integrating virtual agents into TA-CAMP Life.
In this way it will be possible to support customer
care, following the directions traced in [10].
[4]. Edwards, C., Another World. IEEE Engineering &
Technology, December 2006.
[5]. Linden
[6]. Nichols, S., IBM to build corporate Second Life
[7]. Rymaszewski, M, Au, W., J., Wallace, M., Winters,
C., Ondrejka, C., Batstone-Cunningham, B.,
Rosedale, P., Second Life: the office guide. Wiley
Press, 2007.
[8]. Second Life.
[9]. Khoury, M., Shirmohammadi, S., Accessibility and
Environments.I nternational Journal of Product
Lifecycle Management. Vol. 3, pp. 178 – 190, 2008.
[10]. Shen, X., Shirmohammadi, S., Desmarais, C.,
Georganas, N.,D., Kerr, I., Enhancing e-Commerce
with Intelligent Agents in Collaborative eCommunities, In the Proceedings of the IEEE
Conference on Enterprise Computing, ECommerce
and EServices, San Francisco, CA, U.S.A, IEEE,
June 2006.
[11]. Shen, X., Radakrishnan, T., Georganas, N., vCOM:
Electronic commerce in a collaborative virtual
world. In Electronic Commerce Research and
Applications 1 (2002), ELSEVIER, 2002, pp. 281300.
This research has been supported by Regione
Campania, founding the TA-CAMP project.
[12]. Williams, I., Linden Lab expands e-commerce in
[13]. XyzzyText,
[1]. Bonis, B., Stamos, J., Vosinakis, S., Andreou, J.,
Panayiotopoulos, T., (2007), Personalization of
Content in Virtual Exhibitions, in the Proceedings of
Genòmena: a Knowledge-Based System for the Valorization of Intangible
Cultural Heritage
Paolo Buono, Pierpaolo Di Bitonto, Francesco Di Tria, Vito Leonardo Plantamura
Department of Computer Science – University of Bari
Via Orabona 4, 70125 Bari
{buono, dibitonto, francescoditria, plantamura}
The Italian nation is famous for its history and cultural
heritage. Artefacts and cultural treasures dating back to
various periods of the past are often preserved in
museums, but traditions, dialects, cultural events are some
examples of intangible heritage from the past that cannot
be kept in museums. They are the basis of current
cultures but nevertheless the historical memory of them
tends to disappear since it is difficult to preserve for the
new generations. In this paper we present Genòmena, a
system that has been designed to store and preserve
intangible cultural heritage, thus saving it for posterity.
Genòmena allows different types of people to access such
intangible heritage via a Web portal. Thanks to its
underlying knowledge-base, it is possible to gain
information in different ways, like multimedia documents,
learning objects, event brochures.
archeological parks preserve much of the ancient heritage,
but traditions, dialects, cultural and religious events are
examples of intangible heritage that it is difficult to
maintain for future posterity [2].
The Genòmena system has been developed to preserve
and recover the ancient traditions of the people of the
Puglia region. The name derives from the ancient Greek
word JHQòPHQD, which means events. As will be
described in the paper, the system provides information to
various types of people and in different ways, namely
multimedia documents, learning objects, event brochures.
Genòmena has three main objectives: 1) to foster the
dissemination of intangible heritage in order to keep its
historical memory alive; 2) to promote tourism in the
Puglia region by providing detailed information about
items of intangible heritage; 3) to support research on
cultural heritage.
One of the peculiar features of Genòmena is that it offers
the possibility of performing very advanced data searches.
In fact, the underlying knowledge base makes it possible
to retrieve information on the basis of semantic as well as
spatial and temporal relationships among the stored
The paper has the following organization. Section 2
briefly describes related work. Section 3 presents the
system architecture and the main users of Genòmena.
Section 4 describes our novel approach to help users to
find relevant information, based on an ontological
representation and on a knowledge-based search agent.
Finally, some conclusions are reported.
1. Introduction
The variety of people's cultures is the result of a long
evolution that, during the course of centuries, transforms a
territory and the customs and traditions of its inhabitants.
History is not only written in great literary works, but is
also preserved through traditions, dialects, etc., which all
contribute to people's culture and cultural heritage. Only
through the study and preservation of this heritage can the
memory of a territory and its inhabitants be kept alive and
appreciated in the present time.
The 2003 Convention for the Safeguarding of the
Intangible Cultural Heritage defines the intangible cultural
heritage as “the mainspring of our cultural diversity and
its maintenance a guarantee for continuing creativity” [1].
Intangible cultural heritage is manifested in domains such
as: oral traditions and expressions, including languages
and dialects as a vehicle of the intangible cultural heritage;
performing arts, e.g. traditional music, dance and theatre;
social practices, rituals and festive events; traditional
Since the time of "Magna Grecia" (8th century BC), Italy,
and especially the Puglia region, has been a crossroad of
peoples coming from the Mediterranean basin (and not
only). Puglia underwent several periods of foreign
domination and was the site of many important
pilgrimages to visit the relics of Saint Nicholas, one of the
most revered saints of all Christendom. Museums and
2. Related work
Genòmena is a novel system that, among its various goals,
aims at supporting the exploration of relationships among
several cultural heritage documents. Other systems have
been built for this purpose. PIV is a system that allows
users to search for documents related to Pyrenean cultural
heritage [13]. PIV is based on Web services and allows
people to retrieve documents according to a geographic
search. It is equipped with both a content-based search
engine and a semantic engine. The semantic engine is
integrated with a geographical database that is able to
search for spatially related documents. The results are
visualized using a cartographic representation in which
each document is represented by a point near the place it
engines are a subset of the former and adopt the
assumption that a new problem can be resolved, by
retrieving and fitting the solution found for already stored
similar cases.
An example of a rule-based engine can be found in [18],
where an expert system gives search results about hotels,
providing the reasons for the selected items. Instead, an
example of a case-based engine can be found in [19]. In
this paper, the authors describe the Entree system, which
is able to suggest restaurants. On the basis of the
information inserted by the user, the system selects from
its knowledge base a set of restaurants that satisfy the user
preferences. Finally, the system sorts the retrieved
restaurants according to their similarity with the current
case. The Genòmena system acts as a rule-based engine.
The visualization of data that have inherent spatiotemporal information in the Web is not an easy task. This
is confirmed in the study performed by Sutcliffe et al.
[16]. Several ways of presenting results of a query have
been adopted. Yee et al. propose a visualization based on
facets [17]. We were inspired by this work and the
presented results of the advanced search engine using a
multidimensional approach. The visualization is dynamic
and provides the possibility to apply filters. Figure 1
shows a dynamic web page, split in three areas. The top
left area contains information on the search engine
subdivided into general items, geographic areas, time,
respectively. The bottom left area represents a tree that
contains IICH documents, brochures, learning objects that
are correlated according to the search results presented in
Section 4. The right area presents the details of the
retrieved (and filtered) items.
evokes. The system do not retrieve temporally related
documents (e.g. documents written in the same period).
The P.I.C.A. project aims at preserving and valorizing the
Po Valley and the Western Alps [14]. The system has
been developed in order to allow users to access cultural
documents related to this territory. It is equipped with an
XML-based search engine that retrieves documents by
using both traditional keywords based searches and
graphic maps. The extracted documents are visualized as
cards describing specific items (e.g. monuments). Graphic
maps show topographic information, thanks to interaction
with the MapServer. Also in this case, the user can only
browse documents according to spatial criteria.
An interesting system is T.Arc.H.N.A., that provides
cultural contents by a narrative visualization of items [20].
The narrations, composed by XML files and visualized as
multimedia contents, are searched for by Archeologist,
using a Narration Builder, which is a search engine that
generates queries to be sent to different databases,
containing documents about Etruscan cultural heritage.
Meyer et al. introduce the Virtual Research Environment,
a Web-based search engine that allows users to perform
spatial and/or temporal explorative analyses [15]. This
engine is able to perform advanced searches, creating
queries that combine temporal and geographic criteria.
This system allows users to perform studies of the history
of a territory and a virtual visit of a site. Lastly, the search
engine provides keywords and images based searches,
since all the multimedia objects are described by
metadata. The visualization of the retrieved documents is
based on both interactive maps, which allow a virtual
exploration of a territory, and 3D models, that allow
access to documents referencing a given place at a given
period of time. However, the data are stored in relational
databases and XML files, and there is no ontological
representation of the domain of interest, preventing
semantic searches.
The semantic search is based on explicit knowledge
representation and can reveal every kind of relationship by
using inferential processing.
Knowledge-based search engines use their knowledge
about the user and items in order to generate suggestions,
by reasoning on which items satisfy the user requests.
These systems fall into two categories: rule-based, and
Figure 2 Genòmena system
3. System architecture
Genòmena is a modular, distributed system that includes
web applications, web services and several databases.
As shown in Figure 2 the main entrance of the system is
the Genòmena portal, which allows people to access the
Events browser, the search engine, and the Brochure
Information on the system is shown according to different
user permissions, managed by the User Manager Web
service. The system provides an advanced search only to
Figure 1 The visualization of search results in Genòmena
The rule-based engines use a set of rules to infer
correlations among different items. The case-based
registered users, and allows content management only by
system administrators or cataloguers, as will be seen later
in the paper.
The Advanced Search Engine finds relationships among
Items of Intangible Cultural Heritage (IICHs). In order to
produce the search results it interacts with IntelliSearcher,
which is a knowledge-based search engine whose aim is to
find items that are related by semantic relationships.
Personalized search results are provided by
IntelliSearcher, exploiting the Profile Matcher, which
assigns a score to each resource found, according to the
user profile.
Registered users may get information not only as IICH
documents and event brochures but, since the system also
manages learning objects on topics related to intangible
cultural heritage, they can also access on-line courses
provided by the Moodle web application. Teachers
organize these courses by assembling a set of learning
objects, imported by eTER as a web application that
permits the upload of learning objects described by
metadata, based on IEEE LOM [5] and fEXM [11].
The cataloguer inputs all IICH data through the IICH
manager web application.
Events Manager is a decision support system that assists
event organizers in planning events. Events data are stored
in the IICH database through an ETL process [12].
disseminate knowledge about intangible items. Therefore,
other users of the system are the local inhabitants, who
are interested in information about local events, religious
traditions, multimedia items like photos, video and oral
stories. The catalogued items can also be an object of
study by school children, who are mainly interested in
short on-line courses related to history, religion and their
connections with the territory. For this purpose, the
system is integrated with a Learning Management System
(LMS) which manages learning resources, related to the
most important cultural items classified in the repository.
The e-learning environment increases the possibility of
sharing of the resources, providing on-line courses to be
accessed at any time from any location. Such courses are
organized by teachers working with the Open Source elearning platform Moodle, which is integrated in the
Genòmena system.
Tourism promotion is strictly related to cultural
dissemination. Tourists may benefit from cultural items
and plan customized paths in order to improve their
knowledge about the habits and the traditions of the cities
they want to visit.
There are different kinds of tourists. The business traveler
typically looks through images, searching for event
schedules, city maps and traditional cooking. Other
popular tourists in Puglia are those interested in religion,
since the region is full of important churches and religious
monuments. Such tourists are mainly interested in paths
and journeys proposed by church organizations.
Genòmena also provides the possibility of organizing
special events related to IICHs through the Event Manager
module, used by event organizers. Finally, there are other
people that work behind the scenes. Specifically the
system users that maintain the whole system.
Genòmena has been designed to support all these user
categories, which have been analyzed in depth in order to
develop a system that supports their needs and
expectations, according to a user-centred approach.
Users can search for content and browse several types of
documents. Currently, three types of documents are
supported: multimedia documents structured according to
the ICCD standard for describing an IICH (called IICH
document in the rest of the paper), event brochures, and
learning objects. These documents can be accessed in
different ways, each providing contents with different
3.1. Genòmena users
Genòmena is a system designed to manage items of
intangible cultural heritage, in order to preserve their
memory. Thus, its main objective is to support the
dissemination of stored information to all citizens, ranging
from school children to senior people. Genòmena is also a
great source of information for researchers working on
cultural heritage and is intended to support tourists
visiting the Puglia region.
The users accessing the system are very different, and
interested in getting information on different aspects of
the same item. For example, a student interested in the
traditions of his own territory can access learning objects,
which explain information about a certain item by using a
didactic approach; the tourist, who is interested in cultural
aspects related to religion, gastronomy, etc., gets
information about events such as trade fairs, religious
events, shows, and can access brochures concerning the
requested event; the researcher, who might be interested in
getting anthropological and/or philological data, can
review documents, and technical material, written
according to the Italian Central Institute for Cataloguing
and Documentation (ICCD) standard, which contains
useful details [3, 4].
In order to adequately support users’ requests, all the
available material must be stored and organized in a
structured way, in order to facilitate their retrieval and
The main users of Genòmena, who work with the system
for either inputting data or for retrieving them, are the
following. The cataloguer, who is very familiar with the
ICCD standard and inputs data describing an IICH
according to this standard. The researcher, who is
interested in items related to history and cultural heritage.
As we have said, the main objective of Genòmena is to
4. Finding relevant information
In order to represent the information about intangible
cultural heritage in the system, an in depth study of the
domain was conducted in collaboration with cultural
heritage experts.
As shown in Figure 3, the system knowledge base
distinguishes three types of knowledge: factual, specific,
and general.
The factual knowledge describes different items of
cultural heritage and is stored in the database of the
system. Examples of the factual knowledge are IICH
documents, Learning Objects, event brochures.
The specific knowledge describes the geographic and
historical context of the single item of factual knowledge,
providing specific spatial-temporal relationships. It is
represented in ontological form according to OWL syntax.
For instance the IICH document about the relics of Saint
Nicholas is related to the history of the Saint.
The general knowledge is the basic knowledge used to
build specific knowledge in order to carry out the
inference process within the KB. The general knowledge
describes the historical context of the specific knowledge
and is represented in ontological form. For instance, the
specific knowledge about the saint’s history is
contextualized in the history of Christianity, or the
specific knowledge about different people’s traditions is
contextualized in the history of the people. The general
knowledge, in the example, covers a period that goes
from the Christian period to the present day and represents
traditions, cultures, dominations, religions. The
representation knowledge used by the system for
providing suggestions to users is reported in detail in the
next section. The system knowledge is formalized in order
to explain how it can provide suggestions for searches.
S.N. relics
(in Bari)
S.N. relics
(in Venice)
historical procession there are a lot of actors, such as
knights, jugglers, tumblers, and so on); (j)
audio/video/photo document, that stores the links and the
descriptions of the multimedia content related to the item;
(k) element specification, that contains further information
about the item; (l) data access, that points out the item
copyrights; (m) writing mode, that stores the name of the
expert cataloguer of the item and the date of cataloguing;
(n) features that are indicative of the kind of events related
to the item.
An example of IICH is “the over the sea procession of
Saint Nicholas’ statue”. In the system, this IICH is
represented according to the ICCD standard. In this case,
only eight of the sixteen descriptors are necessary.
Specifically, this item has the following descriptors
activated. Code (a): 1601000005. Definition (b): “vessel’s
statue procession in the sea”. Geographic location (c):
there are various details such as country, city, etc. In this
case country is Italy and city is Bari. Time period (d):
May 7th. Analytical data (f): in this section there is a long
description about the intangible cultural heritage item.
Element specification (k): Rituals and traditional festive
events. Access data (l): no privacy or security limitation.
Writing mode (m): Archive.
As regards the learning objects, they are described using
IEEE LOM [5]. The event brochures are described by the
name of the event, the schedule of sub-events, sponsors
supporting the event organization and the mass media
advertising the event. Each learning object and event
brochure refers to one or more IICH documents so in the
search process the system finds not only an item of IICH
document but also related learning objects and brochures.
The system represents the specific and general knowledge
using the same representation model, based on objects,
with properties and relationships, using OWL language
[6]. In particular, the relationships are expressed in terms
of time and space.
The spatio-temporal representation has raised several
research questions, for instance: how to define the same
religious worship that takes place in different times and in
different geographic areas; how to define the same title
borne by different persons, i.e. the king of France is
represented by different people, according to the specific
moment in time we are considering, and so on. The
problem has been solved by using the event calculus, an
evolution of the situation calculus, which permits an event
to be considered as a spatio-temporal portion [7]. Using
this technique it is possible to generalize the concept of
event as a space-time portion rather than just en event in
time. A set of functions, predicates and rules was thus
defined, on which space-time reasoning is based. For
instance the following definitions have been made:
x Occurrence(e, t): that indicates that the event e
occurred at time t
x In(e1, e2): that indicates the spatial projection
of event e inside another space (e.g. In(Rome,
x Location(e): that indicates the smallest place
that completely covers event e (e.g.
Location(relicX) = ChurchY
x Start(): that indicates the first moment of time
of the event
S.M. relics
Figure 3 Three kinds of knowledge involved in the
intelligent search process
4.1. System knowledge
The factual knowledge consists of IICH documents,
learning objects and event brochures. The factual
knowledge objects are shown in Figure 3. Each intangible
cultural heritage document contains data structured
according to the ICCD standard and is stored with
learning objects and event brochures. An IICH document
describes an item of intangible cultural heritage and is
composed of the following macro-descriptors: (a) codes,
that represent the identifiers of the items at regional level;
(b) definition, that contains the description of the item and
its membership category; (c) geographic location, that
describes where the item is located, specifying nation,
region, province, and city; (d) time period, that indicates
the period of the year when the item happens; (e)
relationships, that contain the references to the related
items; (f) analytical data, that contain a detailed
description of the item; (g) communication, that describes
the kind of communication (such as vocal and/or
instrumental) that accompanies the item; (h) individual
actor, that indicates the presence of a single person in the
item (for instance a ballad singer that tells the traditional
tales); (i) joint actor, that indicates the presence of a set of
people with their respective roles (for instance in a
representation shown in Figure 3.
An IICH document about Saint Nicholas relics is
contextualised in the ontology that describes the life and
the work of the Saint. The specific knowledge is
contextualised in the time-space dimension in the general
Let us suppose that a cultural heritage researcher,
interested in the history of Saint Nicholas, defines as a
search criterion the following string: “relics of Saint
The system initially finds the item of IICH document
related to the search string in the factual knowledge. On
the basis of the data contained in the retrieved IICH
document, the following facts are asserted in the
knowledge base and added to the ontology (of Saint
Nicholas) in the specific knowledge:
1. In 1087 sailors of Bari stole some of the bones of
Saint Nicholas
2. In 1100 sailors of Venice stole other bones of Saint
3. Some bones of Saint Nicholas are in San Niccolò
Lido Church
4. Some bones of Saint Nicholas are in Saint
Nicholas Cathedral
Moreover, in order to join the specific knowledge about
Saint Nicholas with the other specific knowledge the
inferring process uses the general knowledge. In the
example, the following facts are asserted:
San Niccolò Lido Church is in Venice Lido
Saint Nicholas Cathedral is in Bari
Saint Nicholas is patron of Bari
Venice Lido is in Venetian territory
San Marco is patron of Venice
San Marco Cathedral is in Venice
San Marco relics are in San Marco’s Cathedral
Thanks to this process, the system can make the following
logical deduction:
San Marco and Saint Nicholas are correlated
These new inferred facts represent the result of the
inferring process. In this way, the system shows the IICH
document related to both San Marco relics and Saint
Nicholas relics because both of them are kept in churches
that are spatially close.??
End(): that indicates the end of the event
))): that establishes that two events are
consecutive if the instant when the first one
ends is the one when the second one starts.
These predicates and functions allowed us to define the
relations highlighting analogies among fragment of
knowledge. There are three different types: time, space
and concept.
4.2. The search process
The knowledge representation is used by the system in
order to suggest relevant contents that are related to the
user’s query, which is a string inserted by the user.
The search process is composed of three main phases:
Lexical enrichment of the search string: the string
inserted by the user is parsed and completed using the
lexical database MultiWordNet [8, 9]. In this phase
the query string is tokenized and formatted for the
information retrieval process. The terms in the query
string are enriched with synonyms taken from the
MultiWordNet database.
Search and selection of the relevant IICH documents:
starting from the enriched query string, retrieved from
the factual knowledge. For each term of the string, a
list of IICH documents, ranked by relevance, is
Suggestions: the system computes correlations of
each selected IICH document with other IICH
documents, using the specific and general knowledge.
In the suggestion phase, thanks to the information on IICH
documents found, together with the specific and general
knowledge represented via event calculus and the
ontology (stored in OWL format), a run-time knowledge
based is generated. Concepts, instances and properties of
the ontology needed to be formalized in declarative
language: in particular, a hierarchical representation of the
concepts and the properties of the ontology is stated as
rules. The instances are inserted in the database in the
form of facts. After creating the database, the goals for
determining the IICH documents to be suggested were
defined. In this way it is possible to combine various types
of relations (e.g. contemporary, neighboring events, …) in
order to suggest the most relevant IICH.
The result of this process is a list of IICH documents,
which have spatial and temporal relationships according to
the initial search string. Moreover, using the relationships
in the factual knowledge, the system provides a list of
learning objects and event brochures related to the
retrieved IICH documents. The output is then organized
by the profiling system, that ranks and orders the results
according to the needs of the specific user interacting with
the system.
Figure 4 Part of ontology describing a religious event
For a better understanding of the working logic let us
suppose that a user finds an IICH document referencing an
event related to the life of a Saint and that the user is
interested in further events that happen in the same
moment as this event. Two kinds of temporal relationship
4.3 Inferring process: an example
In order to understand how the relationships among the
objects are used in the inferring process, an example of the
knowledge base is presented, according to the knowledge
[7] Russell S. J., Norvig P., Artificial Intelligence: A Modern
Approach. Prentice Hall. NJ: Upper Saddle River. 2003.
have to be considered: the first defines the exact matching
of two or more events during time; the second defines the
temporal analogy among past events. For example, on the
6th of December of every year, there is the celebration of
Saint Nicholas. On the basis of the first temporal
relationship, the user finds further cultural events that
happen in the same period of the year. On the other hand,
thanks to the second relationship, (s)he is also able to find
events like the old winter celebration, that, some centuries
ago, happened exactly on the 6th of December [10]. The
added value of the knowledge based search consists of
semantic relationships discovered automatically. In Figure
4, the class diagram shown reports a part of the ontology.
[8] Pianta E., Bentivogli L., Girardi C. (2002). MultiWordNet:
Developing an Aligned Multilingual Database. Proc. of the First
International Conference on Global WordNet. Mysore. India, 2125 January, pp. 293-302.
[9] Bentivogli L., Forner P., Magnini B., Pianta E. (2004).
Revising WordNet Domains Hierarchy: Semantics, Coverage,
and Balancing, Proc. of COLING 2004 - Workshop on
Multilingual Linguistic Resources. Geneva. Switzerland, 28
August 2004, pp. 101-108.
[10] Jones C. W. (1978). Saint Nicholas of Myra, Bari, and
Manhattan: Biography of a Legend. Chicago and London:
University of Chicago Press.
[11] Roselli T., Rossano V. (2006). Describing learning
scenarios to share teaching experiences. International
Conference on Information Technology Based Higher Education
and Training. IEEE Computer Society Press. Sydney. Australia.
10-13 July 2006, pp. 180-186.
This paper has presented the Genòmena system, which is
designed to manage intangible cultural heritage and to
support its preservation and valorization in order to keep
alive the memory of a territory and its inhabitants. Indeed,
one of the main novelties of Genòmena is its search
engine, that exploits ontological representations and
makes it possible to perform advanced searches, so that
information is retrieved on the basis of various
relationships among the stored objects. Moreover, the
system uses a semantic engine that is able to find spatial,
temporal and categorical relationships among items of
intangible cultural heritage. The results are presented
using a multidimensional dynamic Web interface that
allows users to refine the output and analyze a subset of
retrieved documents.
[12] Kimball R. (2004). The Data Warehouse ETL Toolkit:
Practical Techniques for Extracting, Cleaning, Conforming, and
Delivering Data. John Wiley & Sons.
[13] Marquesuzaà, C., Etcheverry., P. (2007). Implementing a
Visualization System suited to Localized Documents. Fifth
International Conference on Research, Innovation and Vision for
the Future. P. Bellot, V. Duong, M. Bui, B. Ho (eds.). SUGER,
Hanoi. Vietnam. 05-09 March 2007, pp. 13-18.
[14] Agosto E., Demarchi D., Di Gangi G., Ponza G. (2005). An
open source system for P.I.C.A. a project for diffusion and
valorization of cultural heritage. CIPA 2005. XX International
Symposium On International Cooperation to Save the World´s
Cultural Heritage. Torino, Italy. 26 Sept. - 1 Oct. 2005, pp. 607611.
This work is supported by the Genòmena grant, provided
by the Puglia Region. We would like to thank Prof. Maria
F. Costabile and Prof. Teresa Roselli for the useful
discussions during the development of this work. We also
thank the students N. Policoro, M. Gadaleta, G. Vatinno,
and M. T. Facchini for their contribution to the system
[15] Meyer E., Grussenmeyer P., Perrin J. P., Durand A., Drap P.
(2007). A web information system for the management and the
dissemination of Cultural Heritage data, Journal of Cultural
Heritage, vol. 8, no. 4, Sept. - Dec. 2007, pp. 396-411.
[16] Sutcliffe, A. G., Ennis, M., and Watkinson, S. J. (2000).
Empirical studies of end-user information searching. Journal of
the American Society for Information Science. Vol. 51, no.13,
(Nov. 2000), 1211-1231.
[1] UNESCO Web site about
Intangible Cultural Heritage. Last access on March 2009.
[17] Yee, K., Swearingen, K., Li, K., and Hearst, M. (2003).
Faceted metadata for image search and browsing. Proc. of the
SIGCHI Conference on Human Factors in Computing Systems
CHI '03. Ft. Lauderdale, Florida, USA, April 05 - 10, 2003.
ACM, New York, NY, 401-408.
[2] Lupo E., Intangible cultural heritage valorization: a new field
for design research and practice. International Association of
Societies of Design Research, Emerging Trends in Design
Research. Hong Kong Polytechnic University, 12-15 November
[18] Gobin B. A., Subramanian R. K. (2007). Knowledge
Modelling for a Hotel Recommendation System. Proc. of World
Academy of Science, Engineering and Technology. Vol. 21 Jan.
2007. ISSN 1307-6884.
[3] Aiello A., Mango Furnari M., Proto F., ReMuNaICCD: A
formal ontology for the Italian Central Institute for Cataloguing
and Documentation, Applied Ontology, vol. 3, 2006.
[19] Lorenzi, F., Ricci, F. (2005). Case-based recommender
systems: a unifying view. In: Intelligent Techniques in Web
Personalisation. LNAI 3169. Springer-Verlag.
[4], Central Institute for Cataloguing
and Documentation, last access on March 2009.
[5] Learning Technology Standards Committee of the IEEE.
Draft Standard for Learning Object Metadata in IEEE-SA
Standard 1484.12.1, files/LOM_1484_
12_1_v1_Final_Draft.pdf. Last access on March 2009.
[20] Valtolina, S., Mussio, P., Bagnasco, G. G., Mazzoleni, P.,
Franzoni, S., Geroli, M., and Ridi, C. (2007). Media for
knowledge creation and dissemination: semantic model and
narrations for a new accessibility to cultural heritage. Proc. of
the 6th ACM SIGCHI Conference on Creativity & Cognition.
Washington, DC, USA, June 13 - 15, 2007. C&C '07. ACM,
New York, NY, 107-116.
[6] McGuinness D. L., van Harmelen F., OWL Web Ontology
Language Editors, /REC-owlfeatures-20040210. 2004. Last access on March 2009.
Video Quality Issues for Mobile Television
Carlos D. M. Regis, Daniel C. Morais
Raissa Rocha and Marcelo S. Alencar
Mylene C. Q. Farias
Institute of Advanced Studies in Communications (Iecom)
Federal University of Campina Grande (UFCG)
Campina Grande, Brazil
Email: {danilo, daniel, raissa, malencar}
Abstract—The use of mobile television requires the reduction
of the image dimension, to fit on the mobile device screen. The
procedure relies on space transcoding, which can be done in
several ways, and this article uses down-sampling and filtering
to accomplish this. Sixteen types of filter are presented to reduce
the spatial video resolution from the CIF to QCIF format for
use in mobile television. The objective, PSNR and SSIM, and
subjective, PC, methods were used to evaluate the quality of
the transcoded videos. The subjective evaluation used the H.264
encoder, with reduced bit rate and temporal resolution of the
video, was implemented using a cellular device.
Index Terms—Mobile television, Performance evaluation,
Quality of video, Coding and processing, Transcoding.
Institute of Advanced Studies in Communications (Iecom)
Federal University of São Paulo (Unifesp)
São José dos Campos, Brazil
Email: [email protected]
In a digital television scenario the video signal may have
different bit rates, encoding formats, and resolutions. Figure 1
is illustrates a block diagram of the transcoding process [5].
The video transcoder converts a video sequence to another
one, including coding with different temporal and spatial
resolutions and bit rates. The transcoding also saves space
and production time, because only the content with maximum
resolution is stored.
Mobile television is a technology that allows the transmission of television programs or video to mobile devices, including cell phones and PDA’s. The programs can be transmitted
to a particular user in a certain area as a download process,
via terrestrial broadcasting or satellite. The telecommunication operators offer video services using Digital Multimedia
Broadcast (DMB), Integrated Services Digital Broadcasting
Terrestrial (ISDB-T), Qualcomm MediaFLO, Digital Video
Broadcasting – Handheld (DVB-H) [1], [2] and Digital Video
Broadcasting – Satellite (DVB-SH) [3]. The Integrated Services Digital Broadcasting Terrestrial Built-in (ISDB-Tb) standard defines the reception of video signals in various formats
for fixed or mobile receivers, with simultaneous transmission
using the compression standards MPEG-2 and H.264 [4].
Table I shows a comparison of mobile television technologies based on broadcasting transmission.
Video and audio
MPEG-4 or WM9 video
AAC or WM audio
IP over
QPSK or 16 QAM
with COFDM
MPEG-4 video
áudio BSAC
MPEG-4 video
áudio AAC
with FDM
RF bandwidth
5-8 MHz
1.54 MHz
or 16-QAM or
64-QAM with
433 kHz
Fig. 1. The cascaded pixel domain transcoder architecture to reduce the
spacial resolution.
The cell phones present several physical limitations when
compared with a traditional television equipment. The main
restrictions are the battery life, lower processing capacity,
memory capacity and small display. Those restrictions impose
limitations on the videos formats that can be played on a
mobile phone or any other device for mobile reception. The
length and width of the video (spatial resolution), for example,
must fit the video of a small display mobile phone. If the video
signal is larger than the resolution of the display, the content
is not easily seen by the users.
One option is to reduce the size of the device, but this
means an increase in the computational load, which is not
feasible because of the limited processing ability of the mobile
phones. Moreover, more processing implies an increase in
energy consumption.
This paper presents a comparison among different types of
spatial transcoding methods, which are intended for mobile
receivers. The quality issues are discussed and a quantitative
performance analysis is presented for objective and subjective
video quality metrics.
The transcoding process can be homogeneous, heterogeneous or use some additional functions. The homogeneous
transcoding changes the bit rate and the spatial and temporal
resolutions. The heterogeneous transcoding performs the conversion of standards, but also converts between the interlaced
and progressive formats. The additional functions provide
resistance against errors in the encoded video sequence, or
add invisible or watermarks logos [6], [7]. Figure 2 represents
a diagram with various transcoding function.
Fig. 2.
Transcoding Functions.
The are two major transcoder architectures: the cascaded
pixel domain transcoder (CPDT) and the DCT domain
transcoder (DDT) [5]. The first one is adopted in this paper as
the transcoder architecture for the CIF-to-QCIF transcoding,
as shown in Figure 1. The simplified encoder is different from
a stand-alone video encoder in that the motion estimation,
macroblock mode decision, and some other coding processes
may reuse the decoded information from the incoming video
The spatial resolution reduction uses down-sampling, which
changes the picture resolution from the CIF (352×288 pixels)
resolution to the QCIF (176 × 144 pixels) format, using the
down-sampling factor 352 : 176 = 2 : 1. This factor can be
achieved by up-sampling by 1 and then down-sampling by 2,
as shown in Figure 3 ( S = 1, N = 2 ), in which h(v) is a
low-pass filter [5].
Fig. 3. The Interpolation-decimation routine for a change of M/L in terms
of transmission rate.
The filters used in this article are:
within the range (p(i)−2σ, p(i)+2σ) . Then, the average
of pixel intensities in the range is computed [10].
Weighted Average: this technique is the average of all
data entries with varying weights each, weight depends
of the neighborhood pixels, as seen in Figure 4. In this
case, the smoothing is less intense because there is more
influence from the central pixel [11].
Moving Average: this technique replaces values of an
M × M video block by a single pixel, which assumes
the arithmetic mean of the pixels within the M × M
block [8].
Median: it provides a reorganization of the values of the
pixels of an M × M block in an increasing way and
chooses the central value.
Mode: for the calculation of the mode, a comparison is
made with the value that is more frequent in the M × M
block [9].
Sigma: it calculates the mean (p(i)) and standard deviation σ of the block M × M and verifies which pixels are
Fig. 4.
Representing the neighborhood of the central pixel with value ps .
This article presents three weighted averages, given by
Equations 1, 2 and 3.
g(x, y) =
g(x, y) =
(xs + (xt + xt + xu + xu ))
(xs + (xt +xt +xu +xu + (xv +xv +xz +xz )))
g(x, y) =
1 3
(2 xs + 22 (xt + xt + xu + xu ) +
2n (xv + xv + xz + xz ))
The transcoder used in this article includes the cited filters,
with 2 × 2, 3 × 3 and 4 × 4 windows. For the two last ones, the
videos were generated taking the pixels around the reference
pixels. Those filters have been chosen for their simplicity. The
moving average filter also used the 1 × 1 window, and was
named simple elimination.
the H.264 encoder reduces the video bit rate and temporal
resolution, in order to obtain the bit rates needed for the
subjective tests.
For the evaluation of a video transcoder two methods to
assess the video quality are used: objective and subjective.
The objective measurement is fast and simple, but there is low
correlation with the human perception measurement of quality.
On the other hand, the subjective measurement is expensive
and time consuming.
For objective evaluation this paper uses two methods: PSNR
and SSIM. The PSNR is a measure that makes the pixel to
pixel comparison between the reference image and test image.
The SSIM is a method that takes into account the structural
information of the image, those attributes that are reflected in
the structure of the objects of the scene, which depend on the
average luminance and contrast of the image.
For subjective evaluation this paper is based on standard
ITU-T P.910, which is the standard of subjective evaluation
for multimedia [12]. The standard is mentioned three forms
of assessment: Absolute Category Rating (ACR), Degradation Category Rating (DCR) and Pair Comparison (PC). The
method used in this paper uses the PC method.
The structural similarity metric (SSIM) is attracting the
attention of the research community because of the good
results obtained in the perceived quality of representation [13].
The SSIM measures how the video structure differs from the
structure of the reference video, involving the evaluation of
the structural similarity of the video.
The SSIM indexing algorithm is used for quality assessment
of still images, with a sliding window approach. The window
size 8 × 8 is used in this paper. The SSIM metrics define
the luminance, contrast and structure comparison measures,
as defined in Equation 4 [14], [15].
l(x, y) =
2μx μy
μ2x + μ2y
c(x, y) =
2σx σy
σx2 + σy2
s(x, y) =
should be equal or less than 10 s, depending on engine
voting process used. The presentation time may be reduced
or increased, according to content.
Tests were carried out with 20 people. Each participant
watched six video four times generating 120 samples per
video. The participants marked the quality score of a video
clip on an answer sheet using a discrete scale from 0 up to
A cell phone (NOKIA N95) was used for the field tests. The
distance between the participants and the device is 18 cm. This
distance is calculated by multiplying the smaller device screen
dimension by six (3 × 6 cm). The tests lasted an average of
30 minutes.
This section presents the results of the transcoded videos
and the same video transcoded after coding, then the comparison is made.
For analysis of the videos was used the Mobile, News and
Foreman videos [16], with 10 s for each one. These videos
were chosen for displaying the following characteristics:
• Mobile: high texture and slow movement, Figure 5;
• News: little texture and slow movement, Figure 6;
• Foreman: reasonable texture and rapid movement, Figure 7.
σx σy
and SSIM metrics are given in Equation 5
SSIM(x, y) =
(2μx μy + C1 )(2σxy + C2 )
(μ2x + μ2y + C1 )(σx2 + σy2 + C2 )
The constants, C1 and C2 , are defined in Equation 6
C1 = (K1 L)2 and
C2 = (K2 L)2 ,
in which L is the dynamic range of the pixel values, and
K1 and K2 are two constants whose values must be small,
such that C1 or C2 will cause effect only when (μ2x + μ2y ) or
(σx2 + σy2 ) is small. For all experiments in this paper, one sets
K1 = 0.01 and K2 = 0.03, respectively, and L = 255, for 8
bits/pixel gray scale images. The quality measure of a video
is between 0 and 1, with 1 as the best value.
Fig. 5.
Mobile Video.
This method was chosen because the test sequences are
presented in pairs, making a better comparison between the
methods of transcoding.
The PC method consists of test systems (A, B, C, etc.) that
are arranged in all possible n(n−1) combinations of type AB,
BA, CA, etc.. Thus, all pairs are displayed in both possible
orders (eg AB, BA). After each pair presentation, the subject
decides which video has the best quality.
The method specify that, after each presentation, the participants are invited to assess the quality of the indicated
sequence. The average time for the presentation and vote
Fig. 6.
News Video.
For the Mobile video the test showed that the best results
were processed with the 4 × 4 Sigma, 2 × 2 Sigma and
4 × 4 Median filter. News video the best results correspond
the videos processed with the 2 × 2 Sigma, 2 × 2 Median and
4 × 4 Median filter. For the Foreman video the best results
correspond to the processed video with Weighted Average 3,
3 × 3 Moving Average and 2 × 2 sigma filter. These results
are shown in Figure 8.
The Table III and Figure 9 show the result and the SSIM
curves, respectively, for transcoded videos.
Fig. 7.
Foreman Video.
A. Objective Evaluation
The efficiency of a transcoder is evaluated by the PSNR and
SSIM for the processed videos. Table II and Figure 8 show
the result and the PSNR curves, respectively, for transcoded
Simple Elimination
2 × 2 Moving Average
3 × 3 Moving Average
4 × 4 Moving Average
2 × 2 Median
3 × 3 Median
4 × 4 Median
2 × 2 Mode
3 × 3 Mode
4 × 4 Mode
Weighted Average 1
Weighted Average 2
Weighted Average 3
2 × 2 Sigma
3 × 3 Sigma
4 × 4 Sigma
Simple Elimination
2 × 2 Moving Average
3 × 3 Moving Average
4 × 4 Moving Average
2 × 2 Median
3 × 3 Median
4 × 4 Median
2 × 2 Mode
3 × 3 Mode
4 × 4 Mode
Weighted Average 1
Weighted Average 2
Weighted Average 3
2 × 2 Sigma
3 × 3 Sigma
4 × 4 Sigma
Fig. 9.
Fig. 8.
PSNR curves for the transcoded videos.
SSIM curves for the transcoded videos.
It can be observed from Table III that the best results for
the Mobile video were obtained using the 2 × 2 Sigma, 2 × 2
Median and Weighted Average 2 filters. In the videos News
and Foreman the best results using the 2 × 2 Median, 2 × 2
Sigma and Weighted Average 1 filter.
With the results of PSNR and SSIM methods could find
the correlation between the measures. The correlation in the
Mobile video was obtained 0.1408, which is a weak correlation
well. For the News video to get the measures correlation found
was 0.5424, which is a correlation average. For the Foreman
video the correlation between measures obtained was 0.7492,
which is a strong correlation.
B. Processing Time
Regarding the processing time, it is possible to analyze
the increase in time as the filter window increases, as shown
in Table IV. This table shows that the sigma and mode
filters demand longer processing periods as compared with
the moving average and the weighted average filters, and the
median processing time is slightly higher than the average.
The best results considering the processing time was the
simple elimination, weighted average, 2 × 2 and 3 × 3 moving
average and the 2 × 2 median.
The results for the sigma filter shown in the Table IV are
given as the average of the obtained values, because each
window is related to the pixel number.
2 × 2 Sigma
2 × 2 Median
3 × 3 Moving Average
Weighted Average 3
3 × 3 Sigma
Weighted Average 2
Weighted Average 1
3 × 3 Median
Transcoding Method
Simple Elimination
2 × 2 Moving Average
3 × 3 Moving Average
4 × 4 Moving Average
2 × 2 Median
3 × 3 Median
4 × 4 Median
2 × 2 Mode
3 × 3 Mode
4 × 4 Mode
Weighted Average 1
Weighted Average 2
Weighted Average 3
2 × 2 Sigma
3 × 3 Sigma
4 × 4 Sigma
Fig. 10.
MOS Foreman video.
C. Subjective Evaluation
The evaluation of the transcoder with the subjective method
used eight videos, that were transcoded using the Weighted
Average 1, 2 × 2 Sigma, 2 × 2 Median, Weighted Average
2, 3 × 3 Sigma, 3 × 3 Median, 3 × 3 Moving Average and
Weighted Average 3 filter.
The subjective tests were performed using the PC, the device
N95 and all the videos encoded using H.264 encoder with bit
rate of 243 kbit/s and 15 frames/s.
For the Foreman video the values of MOS are shown in
Table V and Figure 10. The best result for this video the
was transcoded video using the 2 × 2 Sigma, 2 × 2 Median,
Weighted Average 3 and 3 × 3 Median filter.
Mobile video for the values of MOS are shown in Table VI
and Figure 11. The best results for the videos that video was
transcoded using the Weighted Average 3 and 3 × 3 Median
filter .
Video News for the values of MOS are shown in Table VII
and Figure 12. The best results for the videos that video was
transcoded using the 2 × 2 Sigma and 2 × 2 Median filter.
2 × 2 Sigma
2 × 2 Median
3 × 3 Moving Average
Weighted Average 3
3 × 3 Sigma
Weighted Average 2
Weighted Average 1
3 × 3 Median
2 × 2 Sigma
2 × 2 Median
3 × 3 Moving Average
Weighted Average 3
3 × 3 Sigma
Weighted Average 2
Weighted Average 1
3 × 3 Median
Sigma and 2 × 2 median filter produced the best result. For
the SSIM method the 2 × 2 Sigma and 2 × 2 Median showed
the best results.For the subjective tests, the spatial transcoded
videos using 2 × 2 Median and 2 × 2 Sigma filters obtained
better results.
As the spatially transcoded videos using 2 × 2 Median and
2 × 2 Sigma filters give the best results for both objective
and subjective measures, one concludes that the techniques
are appropriate to space transcoding. The 2 × 2 Median has a
small advantage over the 2 × 2 Sigma regarding the required
time for processing.
The correlation results show that the SSIM method presents
a better correlation with the subjective tests, when compared
with the PSNR method. Depending on the video technique the
SSIM presents a low correlation with the subjective tests.
Fig. 11.
MOS Mobile video.
The authors acknowledge the financial support from Capes
and CNPq, and thank the Iecom for using its structure and
Fig. 12.
MOS News video.
The correlation between the MOS and PSNR results for
each video was calculated, resulting in a low correlation to the
videos Foreman and Mobile, 0.3721 and 0.3209, respectively,
and a strong correlation to the video News, 0.7745.
Already a correlation between the MOS and SSIM values
obtained better, as expected. For the Foreman video the
correlation between the SSIM and the MOS is average, 0.5837,
for the Mobile video the correlation is weak, - 0.372 and Video
News the correlation is strong, 0.8486.
The filters that have the best results were the 2 × 2 Sigma,
3 × 3 Median, Weighted Average 3 and 2 × 2 Median.
The article discussed the characteristics of mobile television,
mainly related to quality issues. It has been shown that the spatially transcoded videos for this service presented satisfactory
results, since all results provided acceptable PSNRs. For the
evaluation using the PSNR method the 4 × 4 median, 2 × 2
[1] D. T. T. A. Group, “Television on a handheld receiver, broadcasting with
DVB-H,” Geneva, Switzerland, 2005.
[2] A. Kumar, Mobile TV: DVB-H, DMB, 3G Systems and Rich Media
Applications. Focal Press Media tecnology Professional, 2007.
[3] D. V. Broadcasting, “DVB approves DVB-SH specification - new
specification addresses delivery of multimedia services to hybrid satellite/terrestrial mobile devices,” 2007.
[4] M. S. Alencar, Digital Television Systems.
New York: Cambridge
University Press, 2009.
[5] J. Xin, M.-T. Sun, B.-S. Choi, and K.-W. Chun, “An HDTV-to-SDTV
spatial transcoder,” Circuits and Systems for Video Technology, IEEE
Transactions on, vol. 12, no. 11, pp. 998–1008, Nov 2002.
[6] I. Ahmad, X. Wei, Y. Sun, and Y.-Q. Zhang, “Video transcoding: an
overview of various techniques and research issues,” Multimedia, IEEE
Transactions on, vol. 7, no. 5, pp. 793–804, Oct. 2005.
[7] J. Xin, C.-W. Lin, and M.-T. Sun, “Digital video transcoding,” Proceedings of the IEEE, vol. 93, no. 1, pp. 84–97, Jan. 2005.
[8] T. Acharya and A. K. Ray, Image Processing - Principles and Applications. Hoboken, New Jersey, USA: John Wiley & Sons, Inc., 2005.
[9] H. Wu and K. Rao, Digital Video Image Quality and Perceptual Coding.
Boca Raton, FL, USA: CRC Press Taylor & Francis Group, 2006.
[10] R. Lukac, B. Smolka, K. Plataniotis, and A. Venetsanopoulos, “Generalized adaptive vector sigma filters,” International Conference on
Multimedia and Expo. ICME ’03., vol. 1, pp. I–537–40 vol.1, July 2003.
[11] R. C. Gonzalez and R. E. Woods, Digital Image Processing. Boston,
MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2001.
[12] ITU-T, “ITU-T recommendation P.910, subjective video quality assessment methods for multimedia applications,” September 1999.
[13] R. de Freitas Zampolo, D. de Azevedo Gomes, and R. Seara, “Avaliação
e comparação de métricas de referência completa na caracterização
de limiares de detecção em imagens,” XXVI Simpósio Brasileiro de
Telecomunicações - SBrT 2008, Sept. 2008.
[14] L. L. Zhou Wang and A. C. Bovik, “Video quality assessment based on
structural distortion measurement,” Signal Processing: Image Communication, vol. 19, no. 2, pp. 121–132, february 2004.
[15] M. Vranjes, S. Rimac-Drlje, and D. Zagar, “Objective video quality
metrics,” 49th International Symposium ELMAR, 2007, pp. 45–49, Sept.
[16] “Yuv video sequences,”, November 2008.
Comparing the “Eco Controllo” ’s video codec with respect to MPEG4 and H264
Claudio Cappelli1
Eco Controllo SpA
Via Camillo De Nardis 10, 80127 Napoli (NA), Italy
[email protected]
This paper reports results of an experimental comparison between the video codec produced by the company Eco
Controllo SpA and those of main commercial standards,
such as MPEG-4 and H.264. In particular, the experiments
aimed to test the ratio between the quality of the compressed
image and the achieved bit rate, where the quality of the
compressed image is meant as high or low fidelity with respect to the original image. Such fidelity has been measured by means of both objective and subjective tests. In
particular, as for the formers, the Peak Signal to Noise Ratio (PSNR) and the Structural Similarity Measure (SSIM)
have been used. As for subjective tests, the Double Stimulus Impairment Scale (DSIS) methodology standardized
by International Telecommunication Union (ITU) has been
employed [1]. The tests have been repeated for different
video resolutions (corresponding to the different video formats PAL and HD), different frame rates (25, 30, etc.), and
different values of bit rate. Finally, it has been beyond these
experiments the evaluation of critical aspects concerning
live video transmission.
Digital images are subject to several distortions introduced during acquisition, processing, compression, storage,
transmission, and reproduction phases, each of which can
decrease the vision quality. Since images are intended for
human beings, the natural way to quantify their quality is to
use subjective evaluation.
The methodologies for subjective analysis have been
standardized by International Telecommunication Union
(ITU) [1], aiming to make such tests reproducible and verifiable. In practice, subjective tests consist in presenting a
selection of images and videos to a sample of the population. Users watch video contents and express a vote based
on the perceived quality, highlighting the presence of aber-
rations, or distortions, with respect to a given content of reference. The results are opportunely elaborated, and enable
the evaluation of the average quality of the system under
Objective Quality Metrics represent an alternative to subjective metrics. They allow us to considerably reduce costs,
since the test they prescribe can be accomplished much
more rapidly. Objective Quality Metrics derive from subjective analysis, representing a kind of abstraction or theoretical model of them. They can be classified based on
the presence or absence of a reference system (an original
video or image without distortions), which the system under
examination can be compared with. Many existing comparison systems are considered ”full-reference”, meaning that
every system under evaluation can be compared with a reference system without distortions. Nevertheless, in many
practical situations it is not possible to use a reference system, and in such cases it is necessary to adopt a so called
”no-reference” or ”blind” approach. A third situation is that
in which there is a partial availability of a reference system,
that is, only some basic characteristics of the reference system are known. In such a case, the available information can
be considered as a valid support for evaluating the quality
of the system under examination. This approach is referred
to as “reduced-reference”.
The simplest and most widely adopted ”full-reference”
metric is the so called “peak signal-to-noise ratio” (PSNR),
based on the mean square error (MSE), which is in turn
computed by averaging squares of differences in intensity between homologous pixels of the compressed and
the reference images. PSNR is simple to compute and
it has a clear meaning. Nevertheless, it does not always
reflect the visual quality as it is perceived by humans
[3, 4, 5, 6, 7, 8, 9, 10, 11].
In the last three decades, a considerable effort has been
made to develop objective quality metrics exploiting the
known characteristics of the Human Vision System (HVS).
An example of such metrics is the SSIM index: Structural
Similarity Measure. SSIM index compares patterns of pix-
els based on the intensity normalized with respect with luminosity and contrast.
This paper describes the results of an experimental comparison between the video codec produced by the company
Eco Controllo SpA and two main standards, such as MPEG4 and H.264. Eco Controllo has commissioned such comparison to the italian research center on ICT Cerict, which
has accomplished them by means of both objective and subjective metrics. For the objective analysis they have used
both PSNR and SSIM index. Regarding the subjective analysis they have used the DSIS technique [1].
The paper is organized as follows, Section 2 describes
the type of tests that have been performed, including test
parameters and characteristics of hardware used, Section 3
describes the test cases used, Section 4 describes results of
objective tests, and Section 4 those of subjective tests. Finally, conclusions are given in Section 6.
Comparative Tests
The tests have been of type Full-Reference, and have
produced both objective and subjective analysis, aiming to
evaluate compression quality. The comparative study has
been executed on a sample of files compressed in batch
modality, that is, first all the original files have been compressed, and then they have been analyzed. The only constraints codecs had to abide by were the compliance with
required bit rate, and the size of the compressed file. Although Eco Controllo SpA aims to use its codec for live
broadcasting, it has been beyond the scope of this test the
evaluation of possible critical issues arising during transmission and reception of video signals, and issues related
to the hardware and software resources needed to execute
the selected codecs. Furthermore, no constraints have been
imposed on time needed to compress videos.
The Codec produced by Eco Controllo SpA has been
compared with main known standards, such as MPEG-4 e
H.264. To this end, it has been chosen a unique commercial
software embedding both these codecs. All the compared
codecs have been tested by using their respective default parameters, and without human intervention. In particular, the
Simple Profile has been used for MPEG-4 compression, and
the Main Profile for H.264 compression. Moreover, beyond
the specification of the bit rate and the frame rate, no other
parameters have been specified, and no pre/post-production
work has been performed. Finally, compressions and tests
have been executed on a Siemens Celsius V830 Workstation,
whose characteristics are described in table 1.
Test Cases
When executing comparative tests it is particularly important to choose a significative test set. Using a standard
Table 1: Workstation Siemens Celsius V830
8 GB
2 AMD Opteron 240
2 HD SataII di 400GB
NVIDIA Quadro FX 3400 - 256 MB
Windows XP64
test set has the advantage of providing comparable test results, often reducing the cost of tests. On the other hand,
exclusively using well known video sequences potentially
reduces test integrity, since it cannot be prevented the use of
ad-hoc compression techniques, optimized for public available test sets. In this experimental comparison several types
of test sets have been used, including test sets commonly
used in scientific studies of this area, and heterogeneous
video sequences commonly used in television programs,
hence realized with professional quality. In particular, test
cases have been selected among the following video test
• HDTV (720p - 50Hz e 25 Hz) “SVT High Definition Multi Format Test Set” [12] - Video sequences
produced by the swedish television channel SVT,
also available on the ‘Video Quality Experts Group
(VQEG)’ web site ftp://vqeg.its.bldrdoc.
gov/HDTV/SVT_MultiFormat/. Moreover, the
same video sequences have been reduced to derive the
PAL video sequences used in the tests.
• CIF - Test Set: In particular the ”Derf” collection, available at
In order to execute tests, a sample of 17 videos in
the three different formats ([email protected], [email protected],
[email protected]) has been selected. Such formats have been
chosen to test compression algorithms with respect to the
standards that are currently, and in the near future, used in
the television field. In particular, the choice of the PAL format has been done to test our system with respect to the
technology currently used in television transmission systems, whereas the 720P resolution will be the one used in
the near future with the introduction of the so called ”high
Objective Tests on Video 720P and PAL
Objective tests have been executed by using PSNR and
SSIM metrics on a database of PAL and FullHD videos.
Three series of 17 video sequences, in [email protected],
[email protected], and [email protected] formats, respectively, have
been compressed by using the video codecs H.264, MPEG4, and the video codec by Eco Controllo. Each video sequence has been compressed at 500, 1000, 2000, 3000 e
4000Kbps, yielding 765 different compressed files. Among
these, only those having size ±5% di F have been considered, where,
F =
br · s
requested bit rate, which in the figure is indicated with a 0
value, whereas the H.264 codec shows SSIM a poor value,
and the Eco Controllo codec keeps a relatively high score,
never going below the average SSIM = 0.71.
br : bit rate per second in Kbps (1000bit/sec)
s : video duration in seconds
F : file size in KBytes
IndiceSSIM 0.50000
Among the 765 produced file, only 507 resulted valid
after compression with the requested bit rate, and have successively been evaluated through the PSNR and SSIM metrics, by using the MSU Video Quality Measurement Tool rel.
1.4, produced by the Graphics & Media Lab Video Group
of Moscow State University.
Table 2: Test results through the SSIM metrics
Figure 1: Comparison with the SSIM metric - 720P a 25Hz
IndiceSSIM 0.50000
The results, synthesized in tables 2 and 3, reveal that the
codec Eco Controllo has preserved the best quality with respect to the two selected metrics, both on the average and
on each analyzed video sequence. Moreover, the Eco Controllo codec has turned out to be more stable with respect
to the tested video sequences, that is, the gap among single
test sessions is lower than the one observed with the codecs
H.264, and MPEG-4, respectively. This is confirmed by the
confidence interval and by figures 5, 1, 6, 2, 4, and 3.
Another interesting characteristics to notice is that the
Eco Controllo and H.264 codecs reach the same average
maximum vote, and the same can be said for the MPEG4 codec. Nevertheless, by observing worst cases, it can be
noticed that sometimes the MPEG-4 codec fails with the
Table 3: Test results through the PSNR metrics
Figure 2: Comparison with the SSIM metric - 720P a 50Hz
Subjective Tests on Video 720P and PAL
In order to validate results of objective tests, the selected
codecs have been further compared through subjective tests
accomplished by means of the DSIS method [1]. In particular, 8 video sequences have been randomly selected, and
successively shown at three different bit rates (1000,2000 e
3000Kbps) to 16 human evaluators. These have been subdivided in two different groups, each participating to a different evaluation session of 30 minutes.
User evaluation data available on paper support have
been digitized and successively processed according to the
DSIS methodology, yielding the results reported in tables 4
e 5.
IndicePSNR 25.00000
IndiceSSIM 0.50000
Figure 6: Comparison with the PSNR metric - 720P a 50Hz
Figure 3: Comparison with the SSIM metric - PAL a 25Hz
IndicePSNR 25.00000
Figure 4: Comparison with the PSNR metric - PAL a 25Hz
Only video in 720P (e.g. 1280x720 pixels) format have
been selected, with both 25 and 50 Hz video frame frequencies. The choice of such parameters is motivated by the fact
that they are used within the Digital Television and High
Definition Digital TV (HDTV) in all those countries (Italy
included) traditionally using the PAL and SECAM video
transmission systems.
In order to derive more meaningful results, low bit rates
have been used to stress the selected codecs and test their
behavior under critical conditions.
As prescribed by the DSIS methodology, evaluators have
been placed in a comfortable room, and seated in positions
guaranteeing an appropriate visualization angle with respect
to a FullHD plasma monitor used to show video sequences.
Evaluators have been requested to express the quality of
each shown video sequence by choosing one of the following options:
• no defect
• visible but not noisy defects
• slightly noisy defects
• noisy defects
• highly noisy defects
IndicePSNR 25.00000
Figure 5: Comparison with the PSNR metric - 720P a 25Hz
Evaluators have been selected among students and workers. Each of them has preventively undergone the Ishihara
test for color blindness. The latter is a test published by
Prof. Shinobu Ishihara in 1917, and it consists in submitting to the user several colored disks, named Ishihara disks,
each containing a circle of colored points arranged to form
a number visible to people without color blindness problems, and invisible for people having some problems to this
regard, especially in the perception of red and green colors
Table 4: Subjective Analysis results
Eco Controllo
Such results essentially confirm those derived with objective metrics, even though the gap among different codecs
here is amplified. In particular, the support for the confidence index shown in tables 4 and 5, seems to highlight a
greater stability of the Eco Controllo’s algorithm. Even in
this case, by considering the maximum average score, the
algorithms H.264 and Eco Controllo achieve similar results,
which probably means that users do not perceive meaningful defects when codecs are used with less demanding bit
rates. Nevertheless, in the worst and average case the Eco
Controllo’s codec achieves more precise scores, that is, with
less variations with respect to other codecs. Thus, under the
test conditions described here, the Eco Controllo’s codecs
showed better performances with respect to the other selected codecs.
Table 5: Subjective Analysis results, grouped by Bit rate
Bit Rate
Eco Controllo
2000KBps 3000KBps
[4.59,4.85] [4.68,4.95]
Bit Rate
Bit Rate
The successful diffusion of digital video applications depends on the capability to have low cost transmission systems for high quality video sequences. This means to be
able to achieve high compression ratios of images in order
to transmit them on low bandwidth networks, yielding considerable cost reductions. However, in doing this it is necessary to preserve adequate quality of compressed images
with respect to original images. This work described the
results of an experimental comparison of the video codecs
produces by Eco controllo with respect to main commercial
standards, by using several test methodologies described in
the literature, and a conspicuous number of heterogeneous
video sequences. According to results of such tests, both
objective and subjective test methodologies described in
this paper have revealed a better quality of video sequences
compressed through Eco controllo’s codecs, for each chosen bit rate.
[1] ITU-R. ‘Methodology for the subjective assessment of
the quality of television pictures’, RECOMMENDATION ITU-R BT.500-11, 1–48, 2002. 1, 2, 3
[2] S. Ishihara. ‘Tests for colour-blindness’, Handaya,
Tokyo, Hongo Harukicho, 1917. 4
[3] B. Girod, ‘What’s wrong with mean-squared error’, in
Digital Images and Human Vision, A. B. Watson, Ed.
Cambridge, MA: MIT Press, pp.207-220, 1993. 1
[4] P. C. Teo and D. J. Heeger, ‘Perceptual image distortion’, in Proc. SPIE, vol. 2179, pp. 127-141, 1994. 1
[5] A. M. Eskicioglu and P. S. Fisher, ‘Image quality measures and their performance’, IEEE Trans. Commun.,
vol. 43, pp. 2959-2965, Dec. 1995. 1
[6] M. P. Eckert and A. P. Bradley, ‘Perceptual quality
metrics applied to still image compression’, Signal
Processing, vol. 70, pp. 177-200, Nov. 1998. 1
[7] S. Winkler, ‘A perceptual distortion metric for digital
color video’, in Proc. SPIE, vol. 3644, pp. 175-184,
1999. 1
[8] Z. Wang, ‘Rate scalable Foveated image and video
communications’, Ph.D. dissertation, Dept. Elect.
Comput. Eng., Univ. Texas at Austin, Austin, TX,
Dec. 2001. 1
[9] Z. Wang and A. C. Bovik, ‘A universal image quality
index’, IEEE Signal Processing Letters, vol. 9, pp. 8184, Mar. 2002. 1
[10] Z. Wang. ‘Demo Images and Free Software for “a
Universal Image Quality Index”’. Available: http:
research/quality_index/demo.html 1
[11] Z. Wang, A. C. Bovik, and L. Lu, ‘Why is image quality assessment so difficult’, in Proc. IEEE Int. Conf.
Acoustics, Speech, and Signal Processing, vol. 4, Orlando, FL, pp. 3313-3316, May 2002. 1
[12] L. Haglund, ‘The SVT High Definition Multi Format
Test Set’, Sveriges Television, 2006. Available: ftp:
An Experimental Evaluation of the Mobile Channel
Performance of the Brazilian Digital Television
Carlos D. M. Regis and Marcelo S. Alencar
Jean Felipe F. de Oliveira
Institute of Advanced Studies in Communications (Iecom)
Federal University of Campina Grande (UFCG)
Campina Grande, Brazil
Email: {danilo, malencar}
Positivo Informática S/A
Curitiba, Brazil
Email: [email protected]
Abstract—This work presents an analysis of the mobile channel
of the Brazilian Digital Television System. With the advent of this
system, diverse conditions must be emphasized, which pose an
impact on the development of the transmission equipment. The
key variables that influence the degradation of the quality of
the digital signal are the mobile television velocity, the number
of fading components, the random phase shift, the propagation
delay and the Doppler Effect. A robust knowledge about the
behavior of those variables is important to evaluate the channel
transmission, and to design the equipment in accordance with
the available standards. Based on the study of the impact of those
factors a separate assessment of the influence of each variable
in the quality of the demodulated constellations is proposed,
and its relevance on the transmission process. This research was
conducted at Positivo Informática S/A digital TV laboratory.
Index Terms—Mobile Television, ISDB-Tb, ISDTV, Digital TV,
The deployment of digital television system in Brazil leads
to modification of current transmission and reception standards, which implies the need for replacement of transmitters
and antennas currently used by television broadcasters as
well as the television sets installed in homes of television
viewers [1].
The purpose of this study is to create and simulate an urban
transmission environment to enable the analysis of the majors
distortions suffered by the digital signal in the communication
channel of the brazilian digital television system, ISDB-Tb. It
was verified that there is not many studies at this topic [2].
The main metric used in this work is the Modulation Error
Ratio (MER) measured at the receiver, which determines the
relationship between the received symbol average power and
its error average power in the received constellation. The
MER measure observes the received symbol position at the
demodulated constellation and the analysis of these values
will determine the transmission channel quality [3]. The great
majority of the measurement equipaments provide the MER
and BER measurement (Bit Error Rate) separately, leaving
aside the valuable information of the channel quality that the
a joint analysis could bring to light.
The main causes of distortions in urban environments are
the signal shadowing by natural or artificial obstacles, the
Doppler effect, the path fading and the multiple interferences
originated, mainly, at analog and digital transmissions systems
with channels allocated at the same frequency or at adjacent
ones [4] [5] [6].
The main simulated situations of this work consider the
transmission channel of mobile and portable device content,
since it makes no sense to evaluate fixed devices in movement.
This channel will be called mobile channel or 1seg channel
during this work. However, considerations about the transmission channel of contents for fixed set-top boxes, which
will be called fixed channel or fullseg channel, won´t be
neglected and are commonly found during this text. This is
mainly due to the fact that the program of analysis of signals
ISDB-Tb, installed in the spectrum analyzer, which displays
the demodulated constellations, not display them in separate
graphs. Thus, it became convenient to analyze the fullseg
channel (64-QAM modulation) in this work. The parameters
chosen were isolated and for each one was determined their
influence on the degradation of the quality of the received
Given this scenario, the main variables of the mobile
channel analysed at this work are:
• Received power;
• Speed of the mobile device;
• Propagation delay;
• Components of fading;
• The C/N Relation.
For each of these listed variables, an study of their relation
with the modulation error ratio (MER) will the final result of
this work.
Figure 1 shows off the complete setup of the measurement
environment installed at Positivo Informática S/A digital TV
laboratory. The equipments used were:
• A ISDB-Tb transmitter;
where Ij and Qj are, respectively, the phase and quadrature
components of the j-th received symbols and I˜j and Q̃j are,
respectively, the ideally demodulated phase and quadrature
components of the j-th received symbols. The calculation of
the MER compares the current position of the received symbol
and its ideal position. The value of MER increases when the
symbols move away from its ideal position.
The combination of all interference in the transmission
channel cause deviations in the position of the constellation
symbols in relation to their nominal positions. Thus, this
deviation can be considered as a parameter for measuring the
magnitude of interference. And this is, in fact, the role of the
modulation error ratio [8].
fading generator;
spectrum analyser;
mobile receiver (1seg) and a fixed receiver (fullseg);
power splitter.
Fig. 1.
Setup of the measurement environment
This setup illustrated at Figure 1 works on the following
• The transmissor generates the signal on an intermediate
frequency and sends it to the fading generator;
• The fading generator, by its time, adds the particularly
chosen distortions for each simulation case and sends
it back to the transmissor at the same intermediate
• The transmitter generates the signal on an intermediate
frequency and sends it to the fading generator; the spectrum analyser through a high quality coaxial cable passing
through a power splitter.
• One of the power splitter outputs is connected to one the
mobile terminal (sometimes to the fixed terminal) and
the other is connected to the spectrum analyser, where
the ISDB-T Demodulation Analysis software is installed.
• The mobile receiver is plugged in a notebook equiped
with a video and audio software decoder. The streams
contents were visualized at an proprietary application.
The ISDB-Tb Demodulation Analysis software has the
functionalities of exhibition of the demodulated constellation,
spectrogram and the measurement of the modulation error
A. Case I: Received Power
In this simulation the signal to noise ratio used was 40 dB,
which represents a fairly high level and practically nonexistent
in practical situations and 40 dB, which represents a good reception environment. The purpose of using a so loud ratio was
to isolate the behavior of the degradation of the modulation
error ratio only depending on the reduction of the received
signal power (PR ). Figures 2 and 3 shows demodulated constelallations. The simulation was done lowering the received
power of −10 dBm to −90 dBm with a variation of −5 dBm
for the first situation and −10 dBm for the second in each
interval. This channel has no external disturbance.
The Modulation Error Ratio (MER) is the measurement of
the degradation intensity of a modulated signal, which affects
the receiver ability to recover the transmitted information.
The MER can be similarly compared with the signal-tonoise relation on analog transmissions. This measure is very
used on cable digital television systems due to its efficiency
to express the combined effects of different perturbations
at the communication channel. The MER reflects very well
this combinations and is defined on a N symbol interval as
follows [7],
˜ 2
j=1 [(Ij − Ij ) + (Qj − Q̃j ) ]
|vmax |
Fig. 2.
PR = −30 dBm and C/N = 20 dB
B. Case II: Multipath Fading Components
For this simulation case a transmission channel with a
signal-to-noise ratio C/N of 40 dB and received power was
−20 dBm was configured. From this, twenty multipaths were
gradually added, one-by-one, at the fading generator. Figure 4
shows one sample of this simulation case.
Speed (km/h)
Mobile Channel
No issues
No issues
No issues
No issues
No issues
No issues
No issues
Few issues
Many issues
No signal
Fixed Channel
Many issues
Many issues
No signal
No signal
No signal
No signal
No signal
No signal
No signal
No signal
Fig. 3.
acterizes a common value of of the signal power found in
practice, at good reception locations. In the second situation,
the received power is was −80 dBm, which characterizes
the futher places or bad condition of reception (e.g. strong
multipath fading) for mobile terminals. This value is close to
the limit of sensitivity of reception of the majority of mobile
devices tested. Figure 6 shows the sample of the evolution of
the degradation of the channel depending on the delay spread
for each simulated situation.
PR = −30 dBm and C/N = 40 dB
Fig. 4. Sample of the demodulated constellation of the channel with 7 fading
components. C/N = 40 dB
C. Case III: Mobile Terminal Speed
For this simulation case a transmission channel with a
signal-to-noise ratio C/N = 40 dB, received power was
−25 dBm and five multipath fading components with significant power level was configured. From this, the speed
of the mobile terminal was gradually increased at the fading
generator. The Table IV-C shows a evaluation of the received
video quality at a moving mobile terminal. Figure 5 shows a
sample of the speed´s test.
Fig. 5. Sample of the received constellation at 50 km/h and a screen capture
of the spectrum. C/N = 40 dB
D. Case IV: Propagation Delay Spread
For this simulation were set two situations. In the first
configuration, the received power was −40 dBm, which char-
Fig. 6. Sample of the received constellation with a delay spread of 6 ms.
PR = −30 dBm and C/N = 40 dB
A. Case I: Received Power
Figures 7 and 8 shows the resulting graphics of the analysis of the relationship between the received power and the
modulation error ratio (MER). It is possible to see that for
both simulated cases the MER has a nearly proportional
degradation to the level of received power to −50 dBm.
After this level, the degradation becomes more constant. It
is worth mentioning that the Brazilian standard specifies that
the threshold of sensitivity for receiving devices is fixed to
−77 dBm. The Brazilian standard did not determined the
level of sensitivity for mobile devices yet, but in laboratory
tests with some devices, the threshold of reception for mobile
devices is ranging from −85 dBm to −93 dBm for a signalto-noise relation of 20 dB.
B. Case II: Fading Components
Figure 4 shows the graph of the relationship between the
number of fading components and the MER. In this case, the
Fig. 7.
Received Power × MER. C/N = 20 dB
Fig. 9.
Fading components quantity × MER. C/N = 40 dB
C. Case III: Mobile Terminal Speed
Figure 10 depicts the graph of the relationship between
speed and MER. Also, Figure 10 indicates that the MER
for the layer B tends to stabilize after 50 km/h. Anyway,
according to the information in Table IV-C, at this speed
would be difficult to demodulate the information from this
channel since this layer has the purpose of transmission for
fixed set-top boxes. In other words, it will be useless. The
mobile channel (Layer A) indicates also a tendency to stabilize
after the 100 km/h. Table IV-C, obtained in simulations in the
laboratory with mobile devices, shows that a mobile device
compatible with the Brazilian standard would jeopardized their
reception at speeds above 200 km/h.
Fig. 8.
Received Power × MER. C/N = 40 dB
samples showed a significant variation between the initial and
final value, but showed a stable mean behavior in the range.
Figure 4 depicts that even in a channel with a signal to noise
ratio of 40 dB and a power of approximately −20 dBm, which
represents a good condition for the reception, the influence
of the quantity of fading components may cause the device
not display the received content. This scenario is common
when the transmitters are located in centers of large cities.
In the field tests conducted in São Paulo it was noted that
in several places at Paulista Ave, where the great majority of
transmissors are located, even with a high level of received
power, the quantity of fading components signal combined
can cause the saturation of the receiver´s tuner. Another
significant disturbance in this environment is adjacent channel
interference from analog and digital transmission.
Fig. 10.
Mobile terminal speed × MER. C/N = 40 dB
D. Case IV: Propagation Delay Spread
Figure 11 shows the graph of the relationship between the
delay spread of a significant component of the signal and the
MER. The duration of the delay interval used in the tests
was from 1 μ to 6 μ. Through the curves of the graph, is
possible to see that with a delay spread of, only, 6 μ the
modulation error ratio decreases of approximately 5 dB. This
is a considerable value in terms of lower received power, near
the limit of sensitivity.
Fig. 12.
Fig. 11.
Delay spread × MER. C/N = 40 dB
E. Case V: C/N Relation
Figure 12 shows the graph of the relationship between the
modulation error ratio and the carrier to noise ratio C/N of
the communication channel. In this case two situations were
simulated with different transmission powers. In the first test
used was −20 dBm as power transmission and the second
test used was −40 dBm. Figure ?? indicates that to a carrier
to noise ratio of approximately 12 dBm, the two simulated
situations had a linear improvement in the modulation error
ratio. From that point, the graph of Figure 12 shows that for the
simulation case using −40 dBm, even increasing the value of
the C/N, the value of the modulation error ratio MER tends to
be constant. However, for the simulation case using −20 dBm,
the increase of the modulation error ratio tends to vary almost
linearly as a function of the improvement of the C/N relation.
The graphs showed that, in practice, for the ISDB-Tb system, the QPSK modulation performance is most affected with
the effects studied. However, the small number of symbols
used in transmission and power, which implies on a greater
distance between the symbols in the QPSK constellation comparing with the distance of 64-QAM constellation symbols.
Thus, the QPSK modulation has a better immunity to the
effects studied than the 64-QAM used for fixed reception
devices, even with elevated values of the modulation error
ratio. Anyway, the study of the impact of the behavior of
the variables over the modulation error ratio provides a better
understanding of the degradation of the constellation for each
case. It was observed that, even with high power and a high
carrier to noise relation, the degradation of these variables
implies, in most cases, the loss of the device ability to tune a
digital channel.
However, to observe all the imperfections in the transmission channel, a joint analysis of the behavioral BER and MER
is strongly recommended. One of the weaknesses of the MER
is that its measure does not portray intermittent errors that
result in an significant bit error rate.
[1] J. N. de Carvalho, “Propagação em áreas urbanas na faixa de UHF:
aplicação ao planejamento de sistemas de TV digital,” Master’s thesis,
Pontifı́cia Universidade Católica do Rio de Janeiro, Departamento de
Engenharia Elétrica, Rio de Janeiro, Brasil, Agosto 2004.
[2] L. E. A. de Resende, “Desenvolvimento de uma ferramenta de Análise
de desempenho para o padrão de TV Digital ISDB-T,” Master’s thesis,
Pontifı́cia Universidade Católica do Rio de Janeiro, Departamento de
Engenharia Elétrica, Rio de Janeiro, Brasil, Julho 2004.
[3] O. Mendoza, “Measurement of EVM (Error Vector Magnitude) for 3G
Receivers,” Master’s thesis, International Master Program of Digital
Communications Systems and Technology, Ericsson Microwave Systems
AB, Mölndal, Sweden, Rio de Janeiro, Fevereiro 2002.
[4] F. S. C. A. L. T. R. Gunnar Bedicks Jr., Fujio Yamada and E. L. Horta,
“Handheld Digital TV Performance Evaluation Method,” International
Journal of Digital Multimedia Broadcasting, vol. 45, no. 3, p. 5 páginas,
Junho 2008.
[5] M. S. Alencar, Televisão Digital. São Paulo: Editora Érica, 2007.
[6] M.-S. A. Marvin K. Simon, Digital Communication over Fading Channels:A Unified Approach to Performance Analysis. John Wiley Sons,
[7] H. v. R. Walter Fischer, Digital Television: A Practical Guide for
Engineers. Springer, 2004, 384 paginas.
[8] e. M. C. Y. Nasser, J.-F. Hélard, “System Level Evaluation of Innovative
Coded MIMO-OFDM Systems for Broadcasting Digital TV,” International Journal of Digital Multimedia Broadcasting, p. 12 páginas, Março
Decision Support for Monitoring the Status of Individuals
Fredrik Lantz, Dennis Andersson, Erland Jungert, Britta Levin
FOI (Swedish Defence Research Agency)
Box 1165, S-581 11 Linköping, Sweden
{flantz, dennis.andersson, jungert, britta.levin}
must be developed to support the users in their
monitoring, planning and decision making activities.
Eventually, physiological monitoring systems must
also be possible to use in conjunction with various
command and control (C2) systems.
The structure of this work is outlined as follows.
The objectives of the work are presented in section 2.
Section 3 presents and discusses the fundamentals of
the work, which includes the physiological aspects
and the general system focus. Communication issues
are discussed in section 4 while the means for data
integration, i.e. data fusion, are discussed in section
5. In section 6 are the architecture of the command
and control system, the decision support tools and the
system for after action review presented. Related
works are discussed in section 7 and finally the
conclusions of the work appear in section 8.
Abstract: Systems for monitoring status of individuals are
useful in many situations and for various reasons. In
particular, monitoring of physiological status is important
when individuals are engaged in operations where the work
load is heavy, e.g. for military personnel or responders to
crises and emergencies. Such systems support commanders
in the management of operations by supporting their
assessment of Actors’ physiological status. Augmentation
of the commanders’ situation awareness is of particular
importance. For these reasons, an information system that
supports monitoring of such operations is presented. The
system gathers data from multiple media sources and
includes methods for acquiring data from sensors, for data
fusion and decision making. The system can also be used
for after action review and training of actors.
For a large number of reasons, it is important to
monitor the physiological status of individuals
subjected to high physical workload in situations that
may lead to exhaustion and reduced performance.
Such situations concern soldiers in military
operations, fire fighters and other responders to
different crises and emergencies that face high
workload situations. However, novel methods and
technologies must be developed to make the system
effective and efficient. Examples of such
technologies comprise development of a wireless
body area network (WBAN) for the individual, i.e.
body worn sensors and equipment for wireless
communication. Of crucial importance to all such
systems is that they should be easy to carry around by
the individuals, as well as efficient with respect to
how data are collected, analyzed and transmitted to
the end-users for further analysis in their decision
making processes. Furthermore, means for
integration of data from multiple data sources must
also be available. This requires further development
of techniques and methods for sensor data analysis,
multi-sensor data fusion and techniques for search
and selection of relevant information from the data
sources. In all, the collected information should be
used as input to various decision support tools that
In this work, a system for handling multimedia
information for physical monitoring of individuals is
presented. Two aspects are in main focus for this
work. The first aspect concerns the methods and the
algorithms for collection and analysis of
physiological information for determination of the
status of the actors. The system must support the
decision makers’ situation awareness by collection,
fusion, filtering, and visualization of data adapted to
the users requests. The second aspect concerns the
development of a system architecture for such a
monitoring system. Another aspect of interest is to
support after action review (AAR) [10] to give the
actors feedback from training sessions or actual
missions. However, the actual development of the
WBAN is not within the scope of this work.
3.1 Physiological aspects
Physiological and psycho physiological monitoring
can be of interest for various types of applications
such as health and safety monitoring, medical
emergencies, physically challenging exercises, and
study of task performance.
Continuous supervision of human physiological
status requires a set of sensors capable of detecting
the variables of interest. Depending on the target
application these variables may differ significantly.
Health and safety monitoring usually focus on
observations of one or more critical factors, such as
the heart activity in a patient with diagnosed heart
failure or the potentially fatal heat stress for a fire
fighter. For a medical emergency it is important to
use sensors capable of detecting vital signs such as
body temperature, respiration, heart rate, and blood
pressure. In a physiologically strenuous situation in a
hostile environment it may be relevant and feasible to
measure for instance body and ambient temperature,
heart rate, perspiration, altitude, position, and body
posture. Determination of task performance often
comprises both physiological measures of fitness as
well as psycho physiological measures including
subjective ratings, heart rate and heart rate variability
indicating mental stress.
Decision makers can be located in a command
central that may be located at a significant distance
from the monitored individuals. This implies that
communication between the personal server and the
command central must be executed via existing
communication infrastructures. For the system in this
study, all communication is performed via Internet by
attaching a GPRS module to the personal server.
GPRS communication is relatively expensive in
terms of energy consumption compared to
computations in the personal server. Since one of the
design goals of this system is low power
consumption there is a need to minimize the amount
of data being transferred. There are several ways of
reducing GPRS communication as discussed below.
4.1 Data reduction through fusion
By fusing the data at an early state the data being
transferred can significantly be reduced. This implies
that computations should be done locally on the
personal server and that the variables of interest are
known. An example of this is if the system
automatically determines the body posture of the
actor rather than transmitting e.g. raw accelerometer
data. The downside of such a solution is that it limits
the possibilities for post action analysis.
3.2 System aspects
In order to assess physiological status the various
variables need to be properly recorded and further
processed. The data recording system must be
designed to minimize interference with the users’
activities and their ability to move around freely.
Long duration exercises and difficult environments
put additional and tough requirements on the sensors
and the recording system. Generally, sensors should
be durable and easy to apply while the recording
system must be built to assure low weight and
volume, flexibility, and low power consumption. The
overall system structure is described in Figure 1. Data
are transmitted wirelessly from the actors to the
personal server where data are processed and further
transmitted to the decision support system where the
information is further processed and visualized.
4.2 Data reduction through skipping
Data may be collected at high sample rates,
sometimes much higher than needed for the analysis,
and skipping samples may be an option. When two
consecutive samples have no significant difference
then there is no need to transfer the second sample.
What is considered as an insignificant difference is
application dependent and may be changed at runtime
by the user if need arises. Skipping could also be
executed on a regular basis by sending every ith
sample which will reduce the granularity in the data
collected at the command central.
4.3 Data reduction through subscription
In a scenario where several individuals are being
monitored and/or many sensors are being used on
each individual it is unlikely that all data are needed
at all times. Analysts may have different needs for
different stages of the operations. A subscription
solution would then help reducing the data flow since
the analysts always can subscribe to only those data
they are currently interested in. Thus, sensor data not
subscribed to will not be transferred.
Actor status
Figure 1. The overall structure of the system.
the status. In e.g. [11], a Physiological Strain Index is
calculated based on the heart rate and the core
temperature. It is important to note that the
determination of the status of a healthy individual can
be more difficult than for an injured/sick individual
since the healthy individuals’ status values can be
expected to be less extreme than for injured/sick
individuals. In the current situation, it is also
important to take the actors and their co-workers own
evaluation of their status into account. An
automatically deduced status value may in many
applications only serve to notify the user of a
situation where he/she must request information from
the actors.
Data fusion is the process for combination of
data to estimate or predict the state of some system.
The process is commonly separated into five different
functional levels; Sub-object assessment, object
assessment, situation assessment, impact assessment
and process refinement, see [8]. The end-product of
the data fusion process is a situation picture that is
common for all the sensors and other data sources
that has been used to estimate the state.
5.1 Modeling of physiological processes
According to [4], the models that are used in
modeling physiological phenomena are often linear,
deterministic and non-dynamic in spite of the fact
that these phenomena often are non-linear, stochastic
and dynamic. Consequently, there is large room for
improvement in this area using common techniques
from the data fusion area, e.g. Dynamic Probabilistic
Networks, Hidden Markov Models or Sequential
Monte-Carlo methods. As most models are aimed at a
certain group of people (e.g. females of a certain age
and weight), it is also possible to improve the
effectiveness of the systems by tailoring the
algorithms/systems to the unique characteristics of
the individual actors, see also [7].
5.4 Context and data fusion
The context where the actors are performing their
tasks is important for interpretation of the status
values. For instance, it is important to know the
motion mode and velocity of the actors in order to
interpret other physiological values correctly, e.g.
their pulse. The geographical context, the weather
conditions as well as the equipment carried and
clothing worn by the actors are also crucial to the
interpretation of the status values. In the data fusion
process, these data must be collected and fused with
the actors’ state values. An example is the usage of
the 3D terrain models that can be used to improve the
estimation of the altitude of the actors. Conversely,
the actors’ state can be used to interpret the context,
e.g. if it can be detected through the motion pattern of
the actors that a certain area is difficult to traverse, it
may consequently be classified as difficult.
5.2 Data fusion for actor state estimation
The state of the actors is a joint description of
several status variables of interest in the particular
application. Position, velocity, motion mode (i.e.
running, standing, lying down, etc.) and heart rate are
fundamental variables of interest in many
applications. These variables must in some cases be
determined through the combination of data from
several sensors. For instance, by combining data from
accelerometers and GPS motion mode can be
determined. In the data fusion process, the
uncertainties in the data are taken into account. Data
are weighted according to their certainty and
erroneous data can be identified and excluded. Other
variables can be included in the actor state as
described in chapter 3.1.
5.5 Automatic alarms
One of the most important functions of a decision
support system for monitoring the status of
individuals are functions to relieve the users from
having to continuously monitor sensor data. An
important component in such a system is therefore
algorithms for automatic detection of the actors or the
group states that deviate from the normal or expected,
i.e. anomalies. Using algorithms for anomaly
detection, the users can be left with the task to verify
alarms given by the automatic algorithms and take
required actions when appropriate.
Development of adequate algorithms for
anomaly detection is very much a research issue.
Normal values are, for instance, heavily dependent on
the context and on the task performed by the actors.
In some cases, what is “normal” can be defined by
the status of other actors, while in other situations an
5.3 Aggregated measures of actor status
An aggregated measure of actor status should be
an indicator of the ability of the actors to perform
their tasks. Different measures, using different
sensors and data fusion methods, therefore need to be
used depending on the application and the actors’
tasks. In some applications the amount of work
performed by the actors are an effective measure of
the occurring views. The operative section includes a
set of views that are of vital importance to the
ongoing work as they include the currently available
operative information; thus the views in the operative
section represent the current operational picture.
The Import/Export section is basically a buffer
for incoming and outgoing information, which due to
given service calls, made either by a local user or by
an external user, can be sent or received. The
incoming information generally contains sensor data
from groups of individuals being monitored by the
system. The context view (CXV) in the context
section is the storage point of all available
background information such as maps. The user can,
by means of the context view, define the required
area of interest (AOI) and display it in the current
operative view in the operative section resulting in
what here is called a view instance. Eventually, the
view instance of the current operative view (COV) is
completed through the overlay of either the
individuals of interest or the groups of interest to the
current mission. A view instance can then
successively be updated resulting in new instances.
The history section hosts the history view
(HYV), which can be seen as a repository for all view
instances created prior to the current operative view
instance presently residing in the current operative
The current operative section, which is the most
important and powerful section contains four views
for support of the operative work in the monitoring
process; these four views are:
- Current operative view (COV)
- Physiological information view (PIV)
- Individuals of interest view (IIV)
- Groups of interest view (GIV)
COV, which displays the view instance
corresponding to the current operative picture, can be
directly interacted upon. For instance, by clicking at
the icons of the individuals in the view instance,
physiological information corresponding to any
group or single individual can be made available in
PIV. To allow for more complicated results this may
also be combined with a query language, see further
below. IIV and GIV show individual and group
information respectively for personnel subject to
monitoring. This information may include
physiological and as well as location information.
Most available services in the system are part of
the views, of which some are simple and in many
ways similar to ordinary systems commands while
others correspond to conventional services. Three
main groups of services have been identified: 1) view
handling services, e.g. create a new view, 2) view
individual model must be used. Consequently, the
system must also allow for individual variations.
Signal processing
Main section
Decision support and
query tools
service call
Operative section
Import/Export section
service delivery
History section
service call
service delivery
Figure 2. An overview of the system architecture.
6.1 Command and control architecture
The C2 systems architecture discussed here is
service oriented and consequently highly modular. In
particular, the modular approach taken has its roots in
the approach originally taken in the work
demonstrated by Jungert and Hallberg [12] and the
variation here is adapted to the monitoring of the
status of individuals. The system is based on what is
called the role concept model in which the basic
concepts are: (1) views, (2) services, (3) roles, and (4)
their relationships. Primarily, the model is developed
to provide for mission support in command and
control processes. The model illustrates how users
relate to their role in the information management
process. Views are made up by services and visuals,
where a visual corresponds to a visualization of a
view instance. The role concept model is further
discussed in [5] and [12].
The basic building blocks of the architecture are
the sections. The most important sections and their
relations can be seen in Figure 2. To each section a
number of views are assigned corresponding to
various specialized services and supported by one or
more visuals. Sections can be replaced when
required. The two most important sections are the
operative section and the main section. The main
section, which does not contain any views, contains
the main interface of the system through which the
users can manipulate the views and their content by
means of the available services attached to each of
instance handling services, e.g import or create a
new view instance and 3) view instance manipulation
services, e.g. update the content of an existing view
instance. The type is depending on the target of the
service, see the given examples below. The number
of needed services is fairly large and some are unique
to a certain view while others occur in more than one
view. Because of the large number of services, only
some limited examples can be given here to illustrate
the service concept. An example of a view instance
handling service is:
- Request specified information (a view instance)
from a user and store the information in IMV.
Another service for view instance handling is:
- Go back and display the view instance of COV
created at time t.
This view instance is accessed from HYV and
displayed in COV.
An example of a view instance manipulation
service is:
- Update COV with information from IMV
The decision support section of the systems
architecture still requires further research efforts.
However, it will eventually also contain some type of
query tool. In earlier work, a query language for
heterogeneous sensor data, called 6QL, see e.g. [6],
was developed. This query tool also has capabilities
for sensor data fusion. To be used in this environment
6QL needs to be modified and simplified. This is
mainly due to e.g. group leaders of the individuals
subject to monitoring who for this purpose are using
PDAs to present the monitored information. Thus, the
objective here is to adapt the query interface to a
query structure related to dynamic queries [1] but
also to make it suitable for PDAs as described and
demonstrated in [3]. The purpose of the query tool is
generally to use information available in COV, IIV
and GIV as input to the queries and produce the
requested output, either as tables or as graphs, in PIV.
As a consequence of the service based approach ,
which allows import and export of information from
all participating users the content of the four views of
the operative section together corresponds to a shared
operational picture [13], which forms the basis for the
decision making process.
order and related to other data sources, giving the
analysts the opportunity to quickly get an
understanding of the data being observed in
relationship to the context in which it was sampled.
Figure 3 shows such a context from a rescue services
exercise in Sweden 2006 [2]. The system visualizes
and plays back concurrent data from several data
sources dispatched over a large area enabling the
users to quickly get an overview of the current
situation at different locations and thus get a relevant
context for the analyses.
Figure 3. Synchronized visualization of sampled data and
multimedia information for contextual analysis in F-REX Studio.
The current layout displays a priori information, timeline, photos,
video, a map with GPS tracks, heart rate, altitude, stance and
statistical metrics.
F-REX partly implements the system architecture
described above. Figure 3 shows several visuals
synchronized automatically to a COV and with
contextual information provided by the CXV (i.e. the
map in the GPS track visual, statistics for the chart
visual and photo/video in the multimedia visual). The
timeline at the bottom of the screenshot provides an
interface to easily access the HYV while IIV and
GIV can be setup using the tree structured a priori
interface to the left. By selecting individuals or
groups of individuals the heart rate, visual, map and
stance visuals will be updated to reflect the current
This tool is ideal for post action analysis of data
being gathered by the monitoring system. Further,
extensions to allow online visualization of data in FREX are being planned to allow the system being
used as a real time decision support tool and not just
in post real time reviews.
6.2 After action review
AAR [10] is a formalized method for evaluation
of exercises and operations. The F-REX method and
tolls [2] support this type of procedure through
introduction of Reconstruction & Exploration [17].
F-REX supports after-action reviews by enabling
visualization of any data type in a chronological
The research literature exhibits a large number of
works where the monitoring of the individuals’
physiological status is in focus. Generally, this type
of monitoring is also used in many different
applications. However, related work of particular
interest to the work discussed here concerns primarily
physiological monitoring of individual and groups of
individuals is of concern. Integration of such systems
to command and control systems is another important
issue to deal with. Other literature of interest
concerns decision support tools used in this context.
McGrath et al. [16] discuss a crisis management
system called ARTEMIS with the primary purpose to
improve the care of wounded soldiers. The system is
a part of a command and control system and it has
also been developed with the intension to improve
the users’ situation awareness, which is carried out by
improved information gathering even under severe
situations. The authors also argue that, from this
perspective, more reliable decisions can be taken.
AID-N [9] is a triage system with command and
control capacity based on SOA (service oriented
architecture). AID-N is thus a service oriented
approach. The system exploits shared data models to
heterogeneous subsystems. The system can be seen
as a test bed for improved co-operation between
crisis management organizations. A powerful aspect
of the system is that it through its service architecture
allows for a simplified distribution (sharing) of data
between users of the subsystems.
Among the different monitoring systems the
work by Lin et al. describes a system called
RTWPMS [14], which is a mobile system supporting
examination of patients where physiological
information is measured by means of sensors; e.g. for
measuring of blood pressure and temperature.
Another example of a monitoring system is described
by Lorincz et al. [15], [19]. This system corresponds
to a surveillance system using a sensor data network
for data gathering. The primary applications of
concern fall in the areas of crisis management and
medical surveillance of patients. In this system,
simple queries can be put as well. Another
monitoring system that relates to the work discussed
here is described in [22].
An example of a system for extensive medical
decision making and which also uses methods for
sensor data fusion is discussed in [20]. Staceyand et
al. [21] describes a system that can perform
intelligent analysis on clinical data. In [18] a network
approach for measuring physiological parameters is
In this work, a system for monitoring the
physiological status of individuals has been
discussed. Physiological parameters are measured by
means of sensors attached to the bodies of the
individuals in focus for the monitoring process. The
parameters are then transferred and further analyzed
in a system for determination of the individuals
status; this system will eventually also be integrated
with a command and control system that can be used
for both military and civilian applications. Of
importance is also the integration of an after action
review system. The purpose of the latter system is to
offer techniques and methods to give the users
improved means to judge the consequences of certain
operations in which the individuals are involved, but
also to see how these individuals react to the given
circumstances. In the decision support component
this may be combined with geographic information,
weather and other relevant information.
Other aspects that will need further attention in
future research will be concerned with the
development of methods for automatic alarms
through anomaly detection. Methods for tracing the
general state of the individuals in combination with
their motion patterns are also of concern.
Ahlberg, C., Williamson, C., Shneiderman, B.,
Dynamic queries for information exploration: an
implementation and evaluation, Proceedings of
the Conference on Human Factors in Computing
Systems (CHI 92), ACM Press, New York,
1992, pp. 619–626.
Andersson, D., Pilemalm, S. & Hallberg, N.,
Evaluation of crisis management operations
using Reconstruction and Exploration,
Proceedings of the 5th International ISCRAM
Conference, May 4-7, 2008, Washington, DC,
Burigat, S., Chittaro, L.,,Interactive visual
analysis of geographic data on mobile devices
based on dynamic queries, Journal of Visual
Language and Computing, Vol. 19, No 1,
February, 2008, pp 99-122.
Carson, E., Cobelli, C., Modeling Methodology
for Physiology and Medicine, Academic Press
San Diego, CA, USA, 2001.
Chang, S.-K., Jungert, E., A Self-Organizing
Approach to Mission Initialization and Control
in Emergency Management, Proceedings of the
International Conference on Distributed
Multimedia Systems, San Fransisco, CA,
September 6-8, 2007.
Chang, S.-K., Jungert, E., and Li, X., A
Progressive Query Language and Interactive
Reasoner for Information Fusion, J. of
Information Fusion, Elsevier, Vol 8, no 1, 2006,
pp 70-83
Committee on Metabolic Monitoring for Military
Field Applications, Monitoring Metabolic
Status: Predicting Decrements in Physiological
and Cognitive Performance, National Academic
Press, Washington, DC, USA, 2004.
Hall, D. L., Llinas, J., (Eds.), Handbook of
multisensor data fusion, CRC Press, New York,
Hauenstein, L., Gao, T., Sze, T. W., Crawford,
D., Alm, A. and White, D., A cross Serviceoriented Architecture to Support Real-Time
Information exchange in Emergency Medical
Response, Eng. in Medicine and Biology Soc.
(EMBS '06), 28th Annual Intern. Conf. of the
IEEE, New York, NY, Aug. 2006.
Headquarters Department of the Army, A
Leader’s Guide to After-Action Reviews (TC 2520), Washington, DC, 30 September 1993.
Hoyt, R., W., Buller, M., Zdonik, S., Kearns, C.,
Freund, B., Obusek, J., F., Physio-Med Web:
Real Time Monitoring of Physiological Strain
Index (PSI) of Soldiers During an Urban
Training Operation, RTO HFM Symposium on
“Blowing Hot and Cold: protecting Against
Climatic Extremes”, Dresden, Germany, 8-10
October, 2001.
Jungert, E., Hallberg, N., An Architecture for an
Operational Picture System for Crisis
Management, Proceedings of the 14th Inter.l
Conf. on Distributed Multimedia systems,
Boston, MA, Sept. 4-6, 2008.
Jungert, E., A Theory on Management of Shared
Operational Pictures for Command and Control
Systems Design, IADIS Int. Conf. on
Information System 2009, Febr. 25 - 27,
Barcelona, Spain.
Lin, B.-S., Chou, N.-K., Chong, F.-C., Chen, S.J., RTWPMS: A Real-Time Wireless
Physiological Monitoring System, IEEE Trans.
on Information Techn. in Biomedicine, Vol. 10,
No 4, Oct. 2006, pp 647-656.
Lorincz, K., Malan, D. J., Fulford-Jones, T. R.
F., Nawoj, A., Clavel, A., Shnayder, V.,
Mainland G. and Welsh, M., Moulton, S., Sensor
Networks for Emergency Response: Challenges
and opportunities, IEEE Computer, Oct.-Dec.
2004, pp 16-23.
McGrath, S. P., Grigg, E., Wendelken, S., Blike,
G., De Rosa, M., Fiske A. and Gray, R.,
ARTEMIS: A vision for Remote Triage and
Emergency Management Information
Integration, Dartmouth University; Nov.. 2003.
Morin, M. Multimedia Representation of
Distributed Tactical Operations, Linköping
Studies in Science and Technology, Dissertation
No. 771, Linköping University, Linköping,
Sweden, 2002.
Rahman, F., Kumar, A., Nagendra, G., Gupta, G.
S., Network Approach for Physiological
Parameters Measurement, IEEE Trans. on
Instrumentation and Measurement, Vol 54, No 1,
Febr. 2005, pp 337-346.
Shnayder, V., Chen, B.-R., Lorincz, K., FulfordJones, T. R. F. and Welsh, M., Sensor Networks
for Medical Care, Technical Report TR-08-05,
Div. of Engin. and Applied Science, Harvard
University, 2005.
Sintchenko, V., Coira, E. W., Which clinical
decisions benefit from automation? A task
complexity approach, Int. J. of Med. Informatics,
vol. 70, 2003, pp 309-316.
Staceyand, M., Mcgregor, C., Temporal
abstraction in intelligent clinical data analysis:
A survey, J. of AI in Medicine, Vol. 39, No 1,
Jan. 2007, pp 1-24.
Yu S.-N. and Cheng, J.-C., A Wireless
Physiological Signal Monitoring System with
Integrated Bluetooth and WiFi Technologies,
Proceedings of the 27th annual IEEE Conf. on
Eng. in Medicine and Biology, Shanghai, China,
Sept. 1-4, 2005, pp 2203-2206.
Assessment of IT Security in
Emergency Management Information Systems
Johan Bengtsson, Jonas Hallberg, Thomas Sundmark, and Niklas Hallberg
Abstract—During emergency management the security of
information is crucial for the performance of adequate and
necessary operations. Emergency management personnel have
commonly only novice skills and interest in IT security. During
incidents they are totally preoccupied with the crisis
management. Hence, the security mechanisms have to be well
integrated into the emergency management information systems
The objective of this paper is to illustrate how security
assessment methods can be used to support decisions affecting
the information security of EMIS. The eXtended Method for
Assessment of System Security (XMASS) and the accompanying
Security AssessmeNT Application (SANTA) are introduced. The
method and tool support the security assessment of networked
information systems capturing the effects of system entities as
well as the system structure. An example is provided to illustrate
the use of the method and tool as well as the importance of
effective firewalls in networked information systems.
However, with extensive use the dependency in EMIS
increases and, consequently, the need for trusted and reliable
EMIS. Thereby, IT security issues are vital to consider for the
information systems support to be used for emergency
management. Thus, it is essential to have a valid
understanding of the security posture of EMIS. A serious
problem is posed by the fact that if there is no method to
establish the current level of security in EMIS, then there is no
way to decide whether the IT security levels of these systems
are adequate. Furthermore, the effect of any actions to
improve the IT security will be unknown.
Thus, it is crucial to design methods that will remove the ad
hoc nature of security assessment for EMIS. In this paper, a
structured method for the assessment of EMIS is presented.
The method has been implemented as a tool, which is used to
assess security levels of coalition networks at the Combined
Endeavor, an international communications and information
system interoperability exercise.
Index Terms— IT security, IT security assessment, Emergency
This section presents IT security, IT security assessment
and the context of the study.
When emergencies occur, there is little time to consider
other issues than how to handle the situation at hand. Focus is
required in order to minimize the negative consequences of
the situation. Critical decisions have to be made based, often,
on uncertain information. Thus, the decisions have a
significant impact on the success to handle situations. The
information used as foundation for the decisions are more
commonly generated, communicated, processed, provided and
interpreted by the use of information technology (IT) based
systems, i.e., emergency management information systems
(EMIS). EMIS are decision support systems to be used in all
parts of emergency management and response [1]. They
support the emergency managers in planning, training and
coordinating operations [2]. EMIS can be used to, e.g. display
and analyze possible event locations, available resources,
transportation routes, and population at risk [3]. EMIS have
the potential to dramatically increase our ability to, foresee,
avert, prepare for and respond to extreme events [4].
A. IT security
IT security, also referred to as computer security, is defined
in many different ways depending on the context. Excellent
descriptions of various aspects of IT security are provided by,
e.g., Anderson [5], Bishop [6] and Gollmann [7].
Consequently, it is hard to give an explicit definition, which is
suitable for all contexts. Gollmann [7] states that there are
several possible definitions, such as, “deals with the
prevention and detection of unauthorized actions by users of a
computer system.“ In this paper, the term IT security relates to
upholding the characteristics of confidentiality, integrity, and
availability of IT systems and the data processed, transmitted,
and stored in these systems.
B. IT security assessment
Assessment of IT security is performed in order to establish
how well IT systems meet specified security criteria, based on
measurements of security relevant system characteristics or
effects. Hubbard [8] points out that in order to measure
something; it has to be distinctly clear what it is that should be
measured. However, measurements do not have to yield exact
results. Successful measurements improve the knowledge
M.Sc. Johan Bengtsson, [email protected]
Dr. Jonas Hallberg, [email protected])
M.Sc. Thomas Sundmark, [email protected]
Dr. Niklas Hallberg, [email protected]
All with the Swedish Defence Research Agency, Linköping, Sweden
vulnerabilities that can be used to penetrate systems; instead
assessments are based on the security qualities of systems.
Methods based on system characteristics combine values of
selected characteristics to produce security values which
represent the security levels of complete systems. The
Security Measurement (SM) framework is used to estimate
scalar security values [17]. In order to transform relevant
security characteristics into measurable system effects or
characteristics a decomposition method is described. The
outcome is a tree with measurable security characteristics as
leaves. For the aggregation of security values, the weights and
mathematical functions capturing the relations between the
nodes in the resulting tree have to be decided. Because of the
generality of the method, large efforts are required to design
specific methods based on the framework. Since assessments
based on the XMASS can utilize different sets of security
characteristics to capture the security levels of systems, the
process of systems modeling is more clearly specified. Like
security metrics programs, the SM framework lacks support
for capturing the security effects of system structure, which is
explicitly supported by the XMASS.
about the studied phenomena [8]. Hence, IT security
assessments are to provide knowledge about the security of IT
systems. This knowledge can be used to support, e.g.:
x the comprehension of the current security posture by the
actors responsible for the IT security,
x the development and operation of information systems,
e.g. EMIS, with adequate security levels,
x risk management,
x training and awareness concerning IT security,
x the communication of IT security issues,
x security management, and
x trust in IT systems [9].
Although IT security deals with technical elements,
comprehensive IT security assessments need to consider other
related aspects, such as the organizational, human, and
contextual aspects. The inclusion of these aspects emphasizes
the need to consider their influence on the security levels of
systems. However, IT security assessments do not include the
assessment of the security of organizations, persons, and
contexts themselves.
Several approaches to security assessment have been
presented. Security metrics programs refer to the process of:
x identifying measurable system characteristics and
x measuring these security characteristics and effects, and
x produce illustrative, comprehensive presentations of the
results [10-12].
Adequate security metrics should be consistently measured,
inexpensive to collect, expressed by numbers, and have a unit,
such as seconds [10]. The interpretation of specific security
metrics is left to the user. Proponents of security metrics
programs claim that the characteristic of triggering discussions
on the meaning of the presented results is a key benefit. In
contrast, the approach presented in this paper, the eXtended
Method for Assessment of System Security (XMASS), aims at
providing system-wide security assessment values including
the effects of system structure and inter-connections [13,14].
Thus, the whole system is considered during the assessment
rather than isolated system characteristics or effects.
Attack-based methods assess systems based on the steps
that adversaries have to complete in order to achieve their
goals, e.g., [15,16]. The method based on the weakestadversary security metric aims to enable the comparison of
different system configurations based on the attributes
required to breach their security [16]. Characteristics of
network configurations and the current attack stages, e.g. rootlevel shell access on a specific host, form the states of the
system models. The transition rules describe the requirements
for and consequences of the transitions from system states into
other system states. Describing the actual prerequisites of
successful attacks, the presented results are intuitive.
However, the analysis of results may not be as
straightforward, e.g., when making comparisons of the system
effects resulting from different system configurations. The
XMASS does not require the knowledge of specific
C. Study context
The Combined Endeavor constitutes an extensive
communications and information system interoperability
exercise. The participants are members of the North Atlantic
Treaty Organization (NATO) and the Partnership for Peace
(PfP). During Combined Endeavor 2007 Sweden participated
with equipment in, and connected external networks to, the
established Region B network. This network is used as the
target of evaluation in this paper.
In order to assess the security of systems, it is essential to
capture the underlying characteristics and effects related to the
systems as well as defining how the computation of security
values should be performed. Thus, both the systems to be
assessed and the computations to be performed have to be
modeled. Provided these models, the base data has to be
captured and the aggregated values have to be computed in
order to receive the final assessment results. To benefit from
the produced results, their presentation has to be adapted to
the recipient (Figure 1).
Systems modeling
Computations modeling
Security values measurement and
Assessment results
Presentation of results
Figure 1: The outline of methods for security assessment.
The eXtended Method for Assessment of System Security
(XMASS) [13,14] has been formulated according to the structure presented in Figure 1 and to fulfill the following
x Provide users with relevant data on the IT security
posture of networked information systems.
x The effects of system entities as well as the system
structure should be captured.
x Since there is no fixed definition of IT security, the
method should support the assessment of different
security aspects which together conjure the definition of
IT security of the user.
x The method should be flexible in order to support the
diverse needs of different users.
x The reuse of assessment data should be supported.
In XMASS, assessments are based on the available
knowledge regarding the security characteristics of the system
entities and their relations [13]. The system modeling is
supported by the possibility to create profiles for standardized
system entities and their relations. There are no explicit
limitations in the method regarding which system entities can
be modeled.
The computation of higher-level security values is
controlled by the computations model, which can be specified
by the users, but is tied to the structure of the system. Thus,
the computation of aggregated security values, not just the
input, depends on the system models as well as the
computations models. The assessment results are presented for
individual entities, for entities in a system context, and for the
entire system.
B. System security assessment workflow
System security assessments in the SANTA are performed
according to the workflow illustrated in Figure 2. A white
background indicates that the activity is part of the calculation
modelling, while a blue background indicates that the activity
is part of the system modelling. The workflow consists of five
activities: (1) Create Requirement Collection, (2) Create
templates, (3) Create profiles, (4) Create system model and (5)
Perform system assessments. The activities are described in
the following sections.
A. XMASS tool
The tool implementation of XMASS is based on the NTE
(New Tool Environment) [18], which is a software framework
supporting the implementation of security assessment
methods. NTE supports the definition of Requirement
Collections (RCs), which enable the specification of different
security features. These security features can in turn be broken
down into a number of security requirements. NTE simplifies
the implementation of tools for security assessment methods
by providing basic functionality such as:
x file handling for organizing systems and projects,
x a data access layer to provide a simple way of reading
and writing to the database, and
x well defined interfaces to facilitate the implementation.
The actual systems modelling and assessment functionality
is implemented as a plug-in for the NTE, called SANTA
(Security AssessmeNT Application). The SANTA is designed
to facilitate variation of values and settings, which makes it
possible to evaluate the XMASS and improve its functionality.
An example of this is that the security-related values of a
modelled entity are structured as a profile which can be reused
by other entities of the same type. A change in one profile,
affects all entities using that specific profile.
Figure 2: The workflow for security assessments.
1) Create Requirement Collection
A Requirement Collection (RC) is a specification of the
security features that should be regarded during the security
assessment. Each security feature is mapped to a set of
security requirements. The fulfilment of these security
requirements will decide the security values of systems and
system entities corresponding to this security feature. Higher
security values for a security feature indicate that the feature is
adequately supported by the assessed system or system entity.
The RC is the basis for the security assessment since it
specifies what needs to be fulfilled in order to receive
favourable assessment results. The templates and profiles
created in the following steps are all dependent of the
specified RC.
2) Create templates
The security profile template defines the importance of each
requirement specified in the RC. The requirements of each
security feature are divided into two categories; fundamental
requirements and important requirements. Fundamental
requirements have to be fulfilled in order for the assessed
entity to be considered as fulfilling the security feature. The
important requirements, on the other hand, are prioritized
regarding their relative importance. The prioritization is
the physical relation profile or a suiting logical relation profile
is selected. To support the modelling of extensive systems, it
is possible to specify sub-systems that can be instantiated in
the visual system model.
During the visual modelling of the system, it is essential
that all the necessary profiles are available. If any profiles are
missing, the third step of the process has to be revisited.
5) Perform system assessments
Once the computation modelling and the system modelling
have been completed the system assessment can start. The
foundation for the security values produced by the XMASS is
the System-dependent Security Profiles (SSPs) that are
computed for all the traffic generators in the system model.
The computation of the SSPs depends on the specified
computation and system models.
The SANTA offers different ways to extract assessment
results from the system model. Next to the modelling surface
is a panel showing the calculated system-dependent security
values which are aggregated security values reflecting the
system as a whole (Figure 3).
For more advanced assessments of a system, there is a builtin evaluation tool which makes it possible to generate graphs
of how changes of security values affect the security. This can
for example be used to illustrate how the security values are
affected if the filtering policies of used firewalls are changed.
There is also a built-in tool for calculating how much each
entity affects the security of each other entity in the system.
This tool can for example be used to identify weak spots in the
system, i.e. the entities having the worst influence on the other
entities in the system.
performed with a method based on the criteria weighting used
in the Analytic Hierarchy Process, AHP, [19] and decides to
what extent each requirement affects the security value of the
regarded security feature. It is possible to regulate the
maximum total influence of the important requirements.
The filter profile template defines how the specified
network traffic filtering functionalities affect the security
value of each security feature. The relative influence of the
filtering functionalities is, for each security feature, specified
with the help of the method based on the AHP [19]. It is
possible to specify the maximum effect a traffic filter can have
on each security value.
3) Create profiles
A profile is a grouping of values which concerns one or
more entities or relations. The main reason for grouping
values into profiles is to facilitate the modelling and simplify
the variation of values, i.e., an alteration of a profile affects all
entities or relations using that specific profile. There are four
types of profiles: (1) security profiles, (2) filter profiles, (3)
physical relation profiles, and (4) logical relation profiles.
There are two main types of entities defined in the XMASS,
traffic generators and traffic mediators. A traffic generator is
an entity which generates traffic and can for example be a
workstation computer or a server. A traffic mediator is on the
other hand an entity which only mediates traffic and can for
example be a router or a switch. Each entity in a system has a
security profile which describes to what degree the entity
fulfils the security requirements specified in the RC. A
fulfilment value of 1 indicates complete fulfilment of a
requirement, while 0 indicates non-compliance. A fulfilment
value between 0 and 1 indicates partial fulfilment.
The filtering functionality and capability of different traffic
mediators can differ widely. Therefore filter profiles are used
to specify how the filtering of the mediator affects the system
Relations are described using relation profiles. There are
two types of relation profiles; one for physical relations and
one for logical relations. The physical relation profile differs
from the other profiles by being specified as a system-wide
setting. Hence all physical relations in a system are modelled
using the same physical relation profile. The physical relation
profile describes associations between entities interconnected
through physical means such as wired or wireless
communication. The logical relation profiles are, on the other
hand, specified per relation and describe logical relations such
as VPN tunnels etcetera.
4) Create system model
Once the previous three steps have been completed, it is
possible to start with the visual modelling of the system.
Entities and relations are created by simply clicking, dragging
and dropping in the modelling surface. When creating a new
entity the first step is to choose whether to create a traffic
generator or a traffic mediator. For the traffic generator, only a
security profile needs to be selected, while the traffic mediator
needs a filter profile as well. When creating a relation, either
Figure 3: An overview of the SANTA.
The modeled system used in this security assessment is, as
mentioned earlier, Region B of the network used at the
Combined Endeavor 2007 (CE07). The graphical view of the
SANTA model of the network is presented in Figure 4.
The purpose of the designed network was to connect the
subnets of the participating nations to a core network in order
to allow them to communicate with each other and the other
security features Access Control, Intrusion Prevention,
Intrusion Detection, Security Logging and Protection against
Malware. In Figure 5, the graph represents the system-wide
security profile, i.e., an aggregation of all the entity SSPs in
the system model. The security values are plotted in the graph
where the filtering capabilities of the firewalls are linearly
increased from zero to the maximum level of the firewall.
nations connected to the core network. The participating
nations controlled their own subnets, so while designing the
Region B network the security focus was set on the firewalls
in between the subnets. All firewalls used in the network were
of the model Färist, which is used by the Swedish Armed
Forces. For a specification of the hardware used in the Region
B network along with the requirement collection, templates,
profiles and settings used in the model refer to [20].
There are three different types of symbols used in the model
representing traffic mediators, traffic generators and subnets
of traffic generators. A subnet represents a given quantity of
identical traffic generators interconnected through a switch. In
this network each subnet represents ten workstations using
Microsoft Windows XP SP2. Information about the actual
number of workstations per subnet was not available at the
time of the modelling.
Figure 5: Assessment results.
To further illustrate the importance of the firewalls, an
incident occurs when an unprotected wireless router is
connected to the network. This makes the network open for
unknown, and probably also unwanted, clients having an
unknown level of security. This threat has been modeled as a
subnet of ten traffic generators having the lowest possible
security level connected to the network at the same switch as
the UK subnet (Figure 6).
Figure 6: Changes made to the network.
By performing the same security assessment, as with the
original model, the importance of the firewalls is even more
obvious (Figure 7).
Figure 4: The model of the CE07 Region B network.
As mentioned earlier, the firewalls are central for the
security level of the CE07 network. To illustrate the
importance of the firewalls, a security assessment is
performed for different levels of traffic filtering. The
requirement collection used for the security assessment is the
collection of requirements on security mechanisms used by
the Swedish Armed Forces [21]. This collection regards the
network-based and connected to public networks. Hence, the
traffic filtering capability of EMIS is one crucial aspect in
order to reach and maintain both integrity and availability.
This aspect is regarded in the assessments performed with
XMASS and SANTA. Hence, such methods and tools support
the design, configuration and operation of trustworthy and
reliable information systems for emergency management.
[1] M. Kwan and J. Lee, “Emergency Response after 9/11: the potential of
real-time 3D GIS for quick emergency response in micro-spatial
environments,” Computers, Environment and Urban Systems, vol. 29,
2005, pp. 93-113.
[2] D. Ozceylan and E. Coskun, “Defining Critical Success Factors for
National Emergency Management Model and Supporting the Model with
Information Systems,” Proc. 5th International Conference on Information
Systems for Crisis Response and Management ISCRAM2008, F. Fiedrich
and B. Van de Walle, eds., Washington, DC, USA: 2008, pp. 276-83.
[3] S. Pilemalm and N. Hallberg, “Exploring Service-Oriented C2 Support
for Emergency Response for Local Communities,” Proc. 5th
International Conference on Information Systems for Crisis Response and
Management ISCRAM2008, F. Fiedrich and B. Van de Walle, eds.,
Washington, DC, USA: 2008.
[4] J.R. Harrald, “Agility and Discipline: Critical Success Factors for Disaster
Response,” The ANNALS of the American Academy of Political and
Social Science, 2006.
[5] R. Anderson, Security engineering: A guide to building dependable
distributed systems, Wiley, 2001.
[6] M. Bishop, Computer Security - Art and Science, Addison-Wesley, 2003.
[7] D. Gollmann, Computer security, Chichester: Wiley, 2006.
[8] D.W. Hubbard, How to measure anything: finding the value of
"intangibles" in business, Hoboken, N.J.: John Wiley & Sons, 2007.
[9] N. Hallberg, J. Hallberg, and A. Hunstad, “Rationale for and Capabilities
of IT Security Assessment,” Proc. IEEE SMC Information Assurance and
Security Workshop IAW '07, 2007, pp. 159-66.
[10] A. Jaquith, Security metrics: replacing fear, uncertainty, and doubt,
Addison-Wesley, 2007.
[11] D. Herrmann, Complete guide to security and privacy metrics: measuring
regulatory compliance, operational resilience, and ROI, Auerbach
Publications, 2007.
[12] E. Chew, M. Swanson, K. Stine, N. Bartol, A. Brown, and W. Robinson,
Performance Measurement Guide for Information Security, National
Institute of Standards and Technology, 2008.
[13] J. Hallberg, N. Hallberg, and A. Hunstad, Crossroads and XMASS:
Framework and Method for System IT Security Assessment, Swedish
Defence Research Agency, FOI, 2006.
[14] J. Hallberg, J. Bengtsson, and R. Andersson, Refinement and realization
of security assessment methods, Swedish Defence Research Agency, FOI,
[15] B. Laing, M. Lloyd, and A. Mayer, “Operational Security Risk Metrics:
Definitions, Calculations, and Visualizations,” Metricon 2.0, Boston:
[16] J. Pamula, S. Jajodia, P. Ammann, and V. Swarup, “A weakest-adversary
security metric for network configuration security analysis,” Proc. 2nd
ACM Workshop on Quality of Protection, 2006.
[17] C. Wang and W. Wulf, “A Framework for Security Measurement,” Proc.
National Information Systems Security Conference, 1997, pp. 522-533.
[18] J. Bengtsson and P. Brinck, “Design and Implementation of an
Environment to Support Development of Methods for Security
Assessment,” Linköping University, Department of Electrical
Engineering, 2008.
[19] T. Saaty, Fundamentals of Decision Making and Priority Theory - with
the Analytic Hierarchy Process, Pittsburgh: RWS Publications, 1994.
[20] T. Sundmark, “Improvement and Scenario-based Evaluation of the
eXtended Method for Assessment of System Security,” Linköping
University, Department of Electrical Engineering, 2008.
[21] Swedish Armed Forces, Requirements on security mechanisms (In
Swedish: Krav på SäkerhetsFunktioner), Headquarters, 2004.
Figure 7: Assessment results for the modified network.
In emergency management many critical decisions are
based on information obtained by the use of information
systems [2,4]. To obtain adequate and effective emergency
responses, it is crucial that emergency managers can trust and
rely on provided information. Hence, the ability to ensure a
sufficient level of IT security within emergency management
information systems (EMIS) is essential. This can be achieved
through methods and tool for IT security assessments. This
paper presents the method XMASS and the tool SANTA
enabling the assessment of IT security.
In XMASS, the assessments capture the effects of system
entities as well as the system structure. The CE07 example
presented in this paper illustrates how filtering affects the
security levels in large networks. As can be seen from the
results presented in Figure 5 and Figure 7, the security values
for AC and PM are constant. This is because these security
features have been modeled to be independent of the security
level of the other system entities. The security values for IP
are generally low when no filtering is active in the firewalls.
This is because the IP value of the security profiles is
relatively low and there are in total many entities collectively
affecting the values of the SSPs. When the filtering
capabilities of the firewalls increase, the security values
corresponding to the SL, ID, and IP improve. This is because
the non-perfect values of the neighbors shielded of by
firewalls increase due to filtering.
The importance of filtering is illustrated by the fact that the
relative difference between the security values of the
networks, with and without the unknown clients connected
through the unprotected wireless router, decreases with more
effective filtering. For example, considering the ID security
feature, the security value decreases with 58% when there is
no filtering and 16% when the maximum filtering of the
modeled firewalls is assumed.
The usability of EMIS as support for the decision making,
within emergency management, requires the integrity as well
as availability of critical information. Modern EMIS are
- <'(<(/)
- 1<',%&+4
- &/&;)
- *(%*4/(%/
+()- 8
' <
L (
F-REX: Event Driven Synchronized Multimedia Model Visualization
Dennis Andersson
Swedish Defense Research Agency
[email protected]
support modeling, instrumentation, data collection and
presentation and makes for an excellent tool to support
debriefings or after-action reviews (AARs) [9], [10],
[6]. However, after several years’ usage it has become
more and more apparent that the design of MIND does
not scale very well to the increasing amount of data
that becomes available as technology becomes more
sophisticated. Also the R&E model does not capture
the way analysts work in practice, so the updating of
MIND also called for an update of the R&E model to
fill in the gaps to better reflect how it is being used in
practice. The improved R&E approach is in this paper
referred to as the F-REX approach [3] to distinguish
between the two versions. The new tool, F-REX
Studio, is streamlined to fit the F-REX approach.
Reconstruction and Exploration (R&E) was
developed to analyze complex chains of events in
distributed tactical operations. The approach
specifically points out domain analysis, modeling,
instrumentation and data collection as the
reconstruction steps that will enable exploration
through presentation. In reality however, analysts
often want to iterate the presentation step and feed
back data into the model enabling iterative analysis.
This work presents an improved version of the R&E
approach that better fits the way analysts work.
While it would be possible to force the improved
version of R&E into existing tools, the increasing
amount of multimedia data becoming available, such
as video and audio, motivates a redesign of existing
tools to better support the new model. This paper also
presents F-REX as the first tool tailored to deal with
multimedia rich models for R&E and streamlined to
follow the improved R&E approach.
2. Design Goals
R&E was designed to let the analyst play back the
course of events much like one would do in a DVD
player for example, and then pause or stop to interact
with a certain set of data when something interesting
shows up in one of the data streams being presented.
This method has proven easy to use for analysts even
with little computer experience, albeit the procedure of
assembling data and couple it to the model is more
difficult and requires understanding of the underlying
models. Although this is more a property of the MIND
framework than it is of the actual R&E approach, it is a
weakness of the approach that the approach does not
capture and support this in a satisfying manner.
The main design goals for F-REX are thus to
maintain ease of use for analysts and simplify the
process of getting data ready for analysis and
presentation. Further the approach is intended to be
very general and usable in many different scenarios,
ranging from strategic level down to operational level.
Bearing that in mind and the fact that new technology
constantly offers new alternatives for data capture in
ways that are impossible to foresee, the approach
should not rely on any particular data source but be
1. Introduction
Analyzing cause and effect in a complex chain of
events spanning over a large area is a very difficult
task for any analyst since the analyst will need to
understand what is going on at multiple locations
simultaneously. It is obviously impossible for an
analyst to observe everything first hand, methods and
tools are needed to overcome this problem. One
promising approach is Reconstruction & Exploration
(R&E) [8] that makes use of a multimedia model of the
operation and enables post action analysis. R&E has
been used successfully in several domains, such as
military exercises, live fire brigade operations, staff
exercises and more.
Closely linked to R&E is the MIND framework,
which was the reference implementation of a toolset
supporting R&E [7], [8]. This system is streamlined to
Figure 1. The improved R&E approach workflow with changes from original R&E outlined.
flexible enough to support just about any type of data
coming from any source.
As for F-REX studio, this must support the F-REX
approach fully and offer a platform onto which it is
easy to develop new modules that make use of new
data or visualization techniques as they become
available. One must also bear in mind that the amount
of data available for capture is very likely to continue
to grow and therefore F-REX Studio should not
introduce any restrictions on data capacity. The final
design goal that was defined is the ability to easily
cooperate between analysts so that multiple analysts
can simultaneously work on the same dataset. Again
this is not something directly restricted in R&E, but the
lack of its explicit support explains why it has not been
implemented in MIND.
To sum up, the most important design goals for FREX and F-REX Studio are flexibility, scalability,
cooperability, extensibility and usability.
scribbled notes, system log files, photographs,
multimedia feeds or any other available data. The data
integration phase serves to integrate the captured data
with the conceptual model and couple it to the research
questions defined in the domain analysis. This data
coupling prepares the model for playback by
categorizing, sorting and coding data as necessary.
The presentation phase is the final phase of R&E.
During the presentation phase the model is played back
from the start to the end and a set of data visualizers
are updated as the chain of events unfolds. This allows
the audience, i.e. at an AAR, to relive the operation
and see what happens at different locations during the
entire operation, giving the analyst a chance to relate
individual actions to the global picture and draw
conclusions that would be impossible from traditional
observation on a single location. This enables the
analyst to detect anomalies from the expected course
of events and other data of particular interest. R&E
does not separate presentation from analysis and it is
unclear what the end product of the Presentation really
is. The F-REX approach tries to remedy this by stating
that the presentation step serves a mean to detect
interesting data that the analyst may want to investigate
The analysts will use the presentation feedback as a
starting point for their analysis and then dig deeper
into the model to try to answer questions or
hypotheses. In the case of abstract questions or
complex relations between events, parts of the analysis
results may be integrated into the model again to
enrich the model for a new presentation and new
analysis. This turns the Exploration phase into a loop,
which will continue until the problems and hypotheses
have been properly investigated.
3. The F-REX Approach
The R&E approach [8] is commonly described as a
process leading from domain analysis to presentation
via modeling, instrumentation and data collection. This
same description serves as a base for the F-REX
improvements of R&E. The new features in the F-REX
approach are highlighted in Figure 1.
The domain analysis, modeling and instrumentation
phases remain virtually unchanged from their original
definition in R&E. The data collection step, however,
has been split into Data collection and Data
integration. The Data collection phase is the phase
where the actual data is automatically captured or
manually collected, according to the plans defined in
the Instrumentation phase. Data may consist of
Figure 2. Screenshot of F-REX Studio showing one layout, presenting multimedia, observer
notes, statistics and GIS information from a rescue services exercise in northern Sweden 2006.
allows for addition of more tools as they become
4. F-REX Software
The main software that has been designed is the FREX Studio which replaces MIND for R&E as the
main engine for modeling and presentation (Figure 2).
A wide range of standalone recording and conversion
tools have been developed to support data capture, as
well as applications to control and monitor data
capture remotely via a network where available.
Data integration is fully integrated into the Studio
and extensions for instrumentation are being planned
alongside integration of data capture tools. It has been
recognized however that it is neither possible nor
desirable to fully integrate everything into the Studio,
for instance standalone data capture systems like
handheld cameras, voice recorders and proprietary
systems which may be more practical to operate
standalone and instead import their data output
manually afterwards. Data capture systems that can be
connected to a F-REX server in some way, such as
NBOT [14] or any network enabled software, may
however benefit from being directly integrated into the
Studio to allow for automation of the otherwise labor
intense data integration process.
All data that is imported is automatically
synchronized using timestamps from the recorders.
However, these timestamps have proven not very
trustworthy due to drifting clocks. Therefore F-REX
supports a multitude of ways to resynchronize data
semi-automatically or manually depending on the
complexity of the clock drifts.
Analysis is partly supported by the F-REX Studio.
Some custom analysis tools for certain types of
detailed analysis have been built in, and the framework
5. F-REX Studio Architecture
The F-REX Studio is built as a desktop application
with loosely coupled modules that can communicate
with each other and the framework in a standardized
manner. The main framework architecture is typically
envisaged as a multi-tier architecture [4] with a clear
distinction between the four defined tiers (Figure 3).
The framework implements the tiers and provides
access to basic routines and common visualization
features. Each module on the other hand is
implemented according to the Model View Controller
paradigm (MVC) [11]. The model in this case is
provided by the framework while the view and the
controller are programmed by the module developer,
assisted by the common routines and definitions
available in the framework.
One of the main reasons for using a 4-tier
architecture is the ability to separate the data repository
from the implementation to allow for a modification of
the physical data structure without having to change
the main code. The 4 tiers in the model are therefore
defined as data tier, data access tier, business tier and
presentation tier.
5.1. Data Tier
The data tier provides the data storage. What
storage facility to use can be configured at runtime by
the user in F-REX Studio. Several experimental
solutions have been tested briefly with object oriented
Figure 4. The main entities and their
relations in the F-REX data tier.
Figure 3. F-REX Studio modeled as a 4-tier
architecture with plugin modules interfacing
the top three tiers.
and feel. Although they are not required they will help
the programmer to quickly get access to the data and
functionality supplied by the base services.
databases and file based solutions, however the
preferred solution, that is also mostly used, is based on
a relational database (Figure 4).
The most central entities in the data tier are Events,
Data and Objects. An Event represents the occurrence
of new Data, for example a new photo available from
a certain camera (represented as a Source). The Event
entity contains time, duration and type. The entity will
typically be linked to one or more Data entities
containing any type of Data related to the Event, for
instance photo, position, comment or camera settings.
Further, the Event can be linked to any number of
Objects, representing for instance the photographer or
the photo subject. Coupling data in this way allows for
automatic processing and filtering of data to quickly
extract useful information.
5.4. Presentation Tier
All user interfaces are located in the presentation
tier. The framework provides a main workspace and
docking system in which the plugin module user
interfaces will reside. The framework also provides
common resources and a menu system with hooks,
onto which plugins can attach their own menu items.
5.5. MVC Module Architecture
A typical plugin for F-REX provides a visualizer
for a certain type of data and/or events. Existing
plugins have typically been developed according to the
MVC architecture, with a triangular communication
pattern between the model, view and controller. The
modules are thus interfacing all the top three tiers of
the main architecture (Figure 3).
By using the supplied base modules, the developer
is given sort of a sandbox in which to develop a
module, where all this is needed is to define a
controller that specifies what type of events are to be
supplied from the model to the view. The view can
then be defined as a user control and the user interface
set up as the developer prefers, and everything else
will be tendered for automatically by the support
modules. A solution like this has proven very useful
for rapid development of new plugin modules.
The most basic plugins that have been implemented
include clock displays, timeline, image, audio, video
and GIS. All of these plugins contains views that are
updated by the controller to synchronize against the
engine clock. Among the more specialized plugins are
the bookmark plugin that allows an analyst to save the
current state of all visualizers and write a comment that
will be tied to the current state. These “bookmarks” are
5.2. Data Access Tier
The data access tier defines the interfaces that are
implemented by the data tier. These interfaces are
accessed by the modules and business tier, allowing
uniform access to the data regardless of the
implementation used in the data tier. Since all access to
the data tier is routed via this tier, different filtering
and other useful data manipulation procedures can
effectively be handled by the data access tier.
5.3. Business Tier
The business tier is split into two parts, the base
services and the plugin support modules. The base
services provide the main event engine that for
instance makes sure all modules are synchronized and
loaded with the right data at the right time. It also
provides a useful message passing scheme to allow the
modules to communicate with each other.
The plugin support modules are basically helper
classes and interfaces that assist the programmer in
developing plugins that will inherit the F-REX look
automatically stored and can easily be returned to at a
later stage.
The communication plugin is also worth
mentioning as it gives a visual presentation of
communication in a network of senders and receivers.
The communication plugin was originally designed for
radio communication, but the flexibility of the data tier
has allowed it to successfully be used also for instance
for e-mail conversations and IP communication.
method to be useful in an AAR context, the
presentation should be done shortly after the exercise
is finished. Due to the massive amount of labor needed
to manually sort and integrate data, the presentation is
not always as complete as would be preferred. If the
infrastructure allows it, data integration should
therefore be automated as much as possible so that
captured data is directly integrated into the model.
Automatic data integration enables another
interesting adoption of the F-REX Studio, namely live
presentation of data. This would in effect make F-REX
Studio a decision support system that could be
integrated into a command & control (C2) system.
The roadmap ahead also includes instrumentation
support for F-REX Studio that would automatically
prepare the data integration system to couple incoming
data in accordance to the instrumentation plan. Partial
integration of existing tools for control and monitoring
of data is also planned. With these two additions the FREX Studio would support all steps of the F-REX
approach and thus become a complete F-REX system.
More plugin modules are also planned, tailored for
visualizing and analyzing, for example communication
in a structured manner using the extended Attribute
Explorer technique [2], [15] or simple tools to organize
and classify events. Other new modules being
discussed are 3-D visualization, health monitoring,
signal analysis, and also visualization of data flow and
system communication.
Future work on the F-REX approach includes
identification of compatible analysis methods to use
and specifying how F-REX fits into the overall scheme
of these methods. Measuring the cost and amount of
time needed for high quality analysis and compare this
to traditional methods is another important task that is
needed to estimate how useful the F-REX approach is.
6. Usage
The F-REX approach and tools have been used to
successfully evaluate several exercises, for instance
tactical army drills [5], strategic HQ staff exercises
[12] and rescue services commander training [13]. As
a proof of concept, an evaluation of professional
football has also been investigated [1].
The F-REX tools and studio have mostly been used
for AAR support and post mission analysis (PMA).
When supporting AARs, the system has been operated
by system experts that are familiar with the tools and
methods. The operator assists the AAR facilitator who
uses the F-REX presentation to show the participants
what has happened and use this to support the
discussions. This has often helped to raise the
discussions from “what happened” to “why did it
happen” which is a significant step forward and has
been appreciated by AAR facilitators.
For PMA, analysts have typically worked in small
groups, analyzing data in more or less traditional
qualitative or quantitative methods using F-REX as a
way to navigate through the massive datasets.
In operative work, the F-REX Studio has been used
by the Swedish Police to synchronize outputs from
surveillance cameras in order to match images and
identify suspects. The predecessor, MIND, has also
been used by the Swedish Rescue Services Agency to
document live operations for feedback and analysis.
Another way of using the F-REX tools is to provide
pre-action presentations (PAP) [16], which is similar to
an AAR, but the audience is shown a previous exercise
or operation and may reflect on events which may give
them an advantage when similar situations occur in
their upcoming operation.
8. Conclusions
This paper presents a slight improvement of the
model for the R&E approach that better maps onto
how researchers and analysts work with massive
multimedia intensive datasets. This model helped in
defining a new framework and tool, the F-REX Studio,
also described in this paper. The F-REX method and
tools have been successfully used to assist multimedia
intense presentations and analyses such as after action
reviews and post mission analysis in several exercises
and some live operations.
Flexibility is reached through the general definition
of instrumentation and data capture that allows for any
instruments to be used and any data to be captured. Of
course this puts high demands on the F-REX Studio to
7. Future Work
A strength, and at the same time weakness, of the FREX approach is the tremendous amount of data that is
typically collected during the data capture phase. This
leads to a substantial work to integrate the data with
the model before presentation can begin. For the
Base data report, FOI-R--1982--SE. Swedish Defence
Research Agency, Linköping, Sweden, 2006.
be flexible in visualization. This is reached through the
plugin interface which allows developers to quickly
create new plugins for F-REX visualizing any data in
any way imaginable automatically synchronized with
all other views.
Scalability is achieved through the 4-tier
architecture which allows the data access and data tiers
to be exchanged for larger data warehouses, using
optimized techniques to access relevant data should it
be necessary. For now however, a relational database
is used as the backbone which provides enough
performance for the time being.
Cooperability can be reached by using a central
resource for the data tier, for instance a network
enabled database, allowing several analysts to work on
the same set of data simultaneously.
Extensibility comes from the modular design in the
business tier which allows the programmer to quickly
develop new visualization modules and link to the rest
of the framework to add new visualization and analysis
Usability is mainly a feature of the presentation tier.
It is up to the developer to create the user interface for
any plugin modules. Using common resources helps
the developer to get a common look and feel of the
modules. The overall usability of the system has not
yet been measured and no conclusions can be made
about it so far.
[6] Headquarters Department of the Army, A Leader’s Guide
to After-Action Reviews (TC 25-20), Washington, DC, 30
September 1993.
[7] Jenvald, J., Methods and Tools in Computer-Supported
Taskforce Training, Linköping Studies in Science and
Technology, Dissertation No. 598, Linköping University,
Linköping, Sweden, 1999.
[8] Morin, M., Multimedia Representation of Distributed
Tactical Operations, Linköping Studies in Science and
Technology, Dissertation No. 771, Linköping University,
Linköping, Sweden, 2002.
[9] Morrison, J.E. and Meliza, L. L., Foundations of the After
Action Review Process, IDA Document 2332, Institute for
Defense Analyses, Alexandria, VA, USA, DTIC/NTIS
AD-A368 651, 1999.
[10] Rankin, W. J., Gentner, F.C. and Crissey, M. J., “After
action review and debriefing methods: technique and
technology”, Proceedings of the 17th Interservice/Industry
Training Systems and Education Conference, Albuquerque,
NM, USA, 1995.
[11] Reenskaug T., “Models – Views – Controllers”
Technical Note, Xerox Parc, 1979.
[12] Thorstensson, M., Albinsson, P.-A., Johansson, M. and
Andersson, D., MARULK 2006—Methods for developing
functions, units and systems, User Report FOI-R--2188--SE,
Swedish Defence Research Agency, Linköping, Sweden,
9. References
[1] Albinsson, P-A. and Andersson, D., “Computer-aided
football training exploiting advances in distributed tactical
operations research”, Sixth International Conference of the
International Sports Engineering Association, (Munich,
Germany), Springer, New York, 2006, pp. 185-190.
[13] Thorstensson, M., Johansson, M., Andersson, D. and
Albinsson, P-A., Improved outcome of exercises—Methods
and tools for training and evaluation at the Swedish Rescue
Services school at Sandö. User Report FOI-R--2305--SE,
Swedish Defence Research Agency, Linköping, Sweden,
[2] Albinsson, P-A. and Andersson, D., “Extending the
attribute explorer to support professional team-sport
analysis,” Information Visualization 7, Palgrave journals,
doi:10.1057/palgrave.ivs.9500178, 2008, pp. 163-169.
[14] Thorstensson, M., Using Observers for Model Based
Data Collection in Distributed Tactical Operations,
Linköping Studies in Science and Technology, Thesis No.
1386, Linköping, Sweden: Linköpings universitet, 2008.
[3] Andersson, D., Pilemalm, S. and Hallberg, N.,
“Evaluation of crisis management operations using
Reconstruction and Exploration”, Proceedings of the 5th
International ISCRAM Conference, Washington, DC, USA,
[15] Spence, R. and Tweedie, L., “The attribute explorer:
information synthesis via exploration”, Interacting with
Computers 11, 1998, pp. 137-146.
[4] Eckerson, W.W. "Three Tier Client/Server Architecture:
Achieving Scalability, Performance,
and Efficiency in Client Server Applications", Open
Information Systems, January 1995.
[16] Wikberg, P., Albinsson, P-A., Andersson, D.,
Danielsson, T., Holmström, H., Johansson, M., Thorstensson,
M. and Wulff, M-E., Methodological tools and procedures
for experimentation in C2 system development - Concept
development and experimentation in theory and practice,
Scientific report, FOI-R--1773--SE, Swedish Defence
Research Agency, Linköping, Sweden, 2005.
[5] Hasewinkel, H. and Thorstensson, M., OMF of air mobile
battalion during Combined Challange-2006 (in Swedish),
Towards Integration of Different Media in a
Service-Oriented Architecture for Crisis
Magnus Ingmarsson
Henrik Eriksson
Niklas Hallberg
Dept. of Comp. and Inform. Sci.
Linköping University
SE-581 83 Linköping, Sweden
Email: [email protected]
Dept. of Comp. and Inform. Sci.
Linköping University
SE-581 83 Linköping, Sweden
Email: [email protected]
FOI Swedish Defence Research Agency
Olaus Magnus v. 42
SE-581 11 Linköping, Sweden
Email: [email protected]
Abstract—Crisis management is a complex task that involves
interorganizational cooperation, sharing of information, as well
as allocation and coordination of available resources and services. It is especially challenging to incorporate new, perhaps
temporary, actors into the crisis-management organization while
continuing to use the same command-and-control (C2) system.
Based on a preceding requirement-analysis study involving interviews and workshops with crisis-management staff, we have
developed a prototype C2 system that facilitates communication,
collaboration, and coordination at the local-community level. A
salient feature of this system is that it takes advantage of a
mash-up of existing technologies, such as web-based mapping
services, integrated in a open service-oriented architecture. By
taking advantage of light-weight solutions capable of running as
web applications within standard web browsers, it was possible
to develop a scalable structure that supports decision making at
multiple levels (operational to tactical) without the need to modify
the system for each level. The use of C2 systems implemented
as web applications creates new possibilities for incorporation of
multimedia components, such as popular web-based multimedia
features. In addition, we discuss the possibility of automatically
integrating multimedia services into the C2 system via a servicediscovery mechanism, which uses knowledge about the services
and the situation to determine which services to display.
Crisis management at the local-community level is challenging in many ways [1]. Two of the most significant challenges
are: (1) The management and coordination of external actors
with regards to participation in solving the crisis situation and
(2) the design and use of the command and control (C2) system
for handling daily activities as well as extreme events. The
first challenge is commonly handled by using human actors
as intermediaries between the crisis-management system and
the crisis-management staff. Typically, the second challenge
is addressed by employing dedicated C2 systems for crisis
A disadvantage of employing dedicated C2 systems, however, is that they are used in serious situations exclusively,
which means relatively infrequent use. Infrequent use leads
to uncertainty among the operators of how to perform certain
actions within the system, which affects overall crisis-response
performance. Furthermore, infrequent use contributes to a lack
of knowledge about how the systems perform in real situations.
In crisis situations, time is a critical factor. Frequently, it
is the case that different C2 systems as well as other information systems must interact on an ad-hoc basis. Often, these
systems cannot interchange data or interpret data that other
systems provide. In practice, these inabilities are currently
handled by humans intermediaries and liaison staff between
the crisis-management organization and the systems employed
by the external actors. For example, if the crisis-management
organization needs transportation, the staff is forced to contact
the transportation companies directly by telephone, since the
crisis-management organization does not have direct access
to, or knowledge about, the systems employed by the transportation companies and the transportation resources currently
available [2]. This type of ad hoc communication is sometimes
a bottleneck because it draws personnel resources.
Although C2 systems can assist response commanders in
situation awareness, planning, and resource allocation [3], the
traditional approach to C2 systems may lead to extensive
system-development times as well as difficulties in integrating
the different actors and their heterogeneous systems. Unless
system designers have a substantial comprehension of the
different actors involved as well as their objectives, activities
and information needs, the result will be systems ill-suited
to the task. Furthermore, it is essential that the different
actors in the local community can synchronize, coordinate, and
distribute resources [4]. Moreover, it is important to integrate
local and regional resources from, for example, fire and rescue
services, police force, and medical-care services in the overall
crisis response. Today, it is possible to develop lightweight C2
systems that facilitates cooperation based on state-of-the-art
web technologies. Such web applications can integrate new
services, including multimedia, in novel ways. For instance,
C2 systems implemented as web applications can relatively
easily support extensions consisting of a mash-up of web
components from different sources.
Although the aforementioned challenges (such as the cooperation between different actors) are significant, the incorporation of multimedia into C2 systems may help in addressing
them. However, the incorporation of multimedia in traditional
C2 systems have been challenging and difficult. This obstacle
is particularly problematic because situational awareness is
essential to crisis management.
To create C2 systems that work in real situations, it is
necessary to incorporate grounded theory. A common theory
used in planning for this type of situation is the OODA
loop1 [5], see Figure 1. The OODA loop states that there
are different phases in the decision-making process. These
phases are: Observe, Orient, Decide, and Act. To achieve a
successful outcome from the decision-making process, it is
important to support the different phases properly. To provide
this support, the C2 system used by the commanding staff must
be OODA-loop aware in that it supports to different phases in
an integrated way. In Section VII, we discuss this need in detail
and present how our model and current implementation tackle
this issue. There have been many enhancements to the original
OODA loop. Brehmer [6] proposed the Dynamic OODA
(DOODA) loop model, which introduces what he refers to
as “additional sources of delay” in the process. Examples of
such types of delay are information delay, which is the time
between actual outcome and the decision-maker being aware
of it; dead time, which is the time between the initiation of an
act and its actual start; and time constant, the time required to
produce results.
components, such as application servers in R ESPONSORIA.
The pluggable structure extends to the application servers
as well. Furthermore, R ESPONSORIA utilizes Enterprise Java
Beans (EJBs) in the form of web services for mapping, notetaking, logging, etc. It is straightforward to expand the the
system by incorporating other web services.
The application server currently employed is Glassfish 2,
a Java-based application- and web server. Glassfish is, like
the rest of R ESPONSORIA, open-source and benefits from the
ability to run on a multitude of platforms and architectures,
something which has been verified during development.
As mentioned, there is a desktop feel to the application
itself. This is obtained by using the Google Web Toolkit
(GWT), which enables developers to use Java syntax to
program an entire web application. Through post-processing
the application is transformed into a JavaScript application
suitable for web browsers. In essence, the developer can program as accustomed to when programming a common desktop
application, but still deploy it as a web-based application.
The R ESPONSORIA client has been successfully tested on
Apple OS X desktop, iPhone, MS Windows XP, Firefox,
Internet Explorer, and Safari. We foresee that the system will
work on most of the high-grade, hand-held machines currently
Since the system utilizes Jave EE web services, it facilitates
their cross-platform distribution in much the same manner as
the main program. The Java EE platform also comes with a
host of features for portability, quality of service, and security.
As described in the Section I, the challenges are systemrelated as well as organizational. The proposed solution is
based on a combination of several existing technologies,
and consists of two parts: R ESPONSORIA and M AGUBI.
R ESPONSORIA is a prototype C2 system implemented as
a web application. It is responsible for the interaction and
connectivity between different services, devices, and users,
once they are selected for inclusion in the situation [7].
M AGUBI is responsible for service/device/actor discovery and
recommendation of different services/devices/actors. Sections
IV and VI describe R ESPONSORIA and M AGUBI, respectively.
To better understand the potential of our model, we present
the basic concepts and ideas behind it. In its most simplified
form, the R ESPONSORIA model is a Service Oriented Architecture (SOA), which uses web services as basis for the entire
A web-based user interface retains a desktop look-andfeel through the use of JavaScript while keeping the solution
accessible through standard web browsers, see Figure 2. A
proxy in the web server enables communication with other
1 Although the OODA loop was originally designed for military situations,
it is used in many other areas as well.
The basis for the development of the prototype user interface
is the set of requirements identified by Pilemalm and Hallberg
[8]. Figure 3 shows the main view in the user interface.
To the left is the service selection panel (A). This panel
lists the available resources, devices and services. We have
incorporated different layouts for inclusion of the different
resources. The first type of listing is an alphabetical one.
Another type of listing is based on the order in which a specific
task is carried out. A third type of listing may be based on
recommendations from the service-discovery system. A menu
bar is placed Immediately above the A panel. This enhances
the perception of the application as a desktop application. This
perception is especially prevalent if used in conjunction with
a full-screen capable browser.
In the service panel itself, the currently selected activity is
shown (B), see Figure 3. As can be seen in, there are tabs
that enable the user to work with many different activities at
the same time. Furthermore, as shown, the service panel itself
also provides opportunity for incorporating different media and
services. Figure 3 (B) illustrates how the system uses Google
Maps together with a web-service tracking mobile phones.
Status information is displayed in the panel to the right,
Figure 3 (C). Currently, three tabs display various information,
such as Request status, Activity status, and Task status. One
particularly important feature is the log, Figure 3 (D), which
& Control
& Control
Fig. 1. The OODA loop (in Brehmer [6]). The OODA loop stands for Observe, Orient, Decide, and Act. Normally, it refers to a single person doing this
cycle. However, the OODA loop can also be used when referring to organizations. Ultimately, the OODA loop describes how an individual or organization
reacts to an event.
Fig. 2. Architecture of the R ESPONSORIA system. (A) Web-based user-interface client. (B) Server cloud consisting of a collection of implemented web
services running on application servers.
also contains a note-taking function. In order to enhance
situation awareness, this log is designed to be shared by
everybody using the system. It enables anyone to review what
has happened, when it happened, as well as who did what.
While the R ESPONSORIA model handles the usage of the
services and the GUI, the M AGUBI model handles service
discovery [9]. Since M AGUBI is targeted towards ubiquitous
computing, it works well in crisis management situations that
have many actors, services, and different types of media.
Service discovery in M AGUBI can be performed in two ways:
1) User activated. By specifying the service or device
that the user is looking for, as well as their potential
properties and priorities, the user can instruct M AGUBI
to search for matches.
2) Automatic. M AGUBI performs the service discovery itself. By using models that describe the user and world,
it is able to decide which services that are of interest to
the users, and subsequently execute searches proactively
for them.
Figure 4 shows the M AGUBI model, which is comprised of
two parts: M AGUBI and O DEN. The whole model is named
M AGUBI since it is the controlling module. The two parts are
surrounded by aiding modules. Starting from the bottom up
in Figure 4, we can see the services and devices themselves.
A peer-to-peer (P2P) subsystem keeps track of these services
and devices.
O DEN is the subsystem responsible for the user-activated or
more traditional service discovery. By using ontologies, O D EN is able to expand on the traditional concepts for semantic
models of devices and services. After using a P2P subsystem
to download semantic descriptions provided by the services
and devices themselves, it evaluates them locally on the client.
After evaluation the results may be presented to the user, or
post-processed in the M AGUBI module.
User Knowledge
World Knowledge
service properties
Fig. 4.
The M AGUBI model.
Device and
Service Ontology
Fig. 3. R ESPONSORIA main user interface: (A) Service selection panel, with preconfigured service groups for different scenarios. (B) Service panel, showing
the Mobile phone positioning service. In this case the service panel is showing the a trace of a mobile phone for the last five hours. (C) Status panel, showing
progress for different requests as well as tasks and actitivites. (D) Log, Showing all activity in the system as well as provider of a note-taking function.
The M AGUBI module may either post-process results from
O DEN, or initiate searches on the users behalf. In the case of
post-processing, M AGUBI inspects the results from O DEN and
compares them to semantic information stored in its ontologies
pertaining to: the world, devices, services, and the users. As an
example the user may try and locate transportation in the form
of a taxi. In an ordinary service-discovery system, the user will
get a long list of available taxis. Using the O DEN subsystem,
the user gets a shorter list, tailored to the exact specifications
of the required properties that the user provided. M AGUBI goes
one step further and may for example filter out such taxis that
might very well fulfill the required transportation properties,
but may soon require refueling, and as such are realistically
unusable, since there is more to the transportation service than
merely being able to start it.
For the proactive part, M AGUBI may commence searches
for services and devices that it judges appropriate for the user.
These searches are based on the user and world models, in
cooperation with the rule engine and its rules. Naturally, these
searches are carried out through the O DEN subsystem and are
subjected to the same post-processing that user-initiated ones
At the head of the M AGUBI system is the GUI/DUI module
[10]. In this case, it may be integrated into the web interface
and accept requests from the user as well as present results
from proactive searches that the M AGUBI module may do
independently from the user.
R ESPONSORIA supports the OODA loop in multiple ways.
First, it supports the first run through of the OODA loop by
providing a rich environment in which to conduct observations.
It is worth noting that R ESPONSORIA also allows for observations to be performed from the field directly in the tool,
thus supporting the orient part of the OODA loop. Second, it
integrates tools for making sense of the observed data, such
as the possibility to visualise numbers quickly as charts (see
Figure 5), further aiding in the orient and decide parts of the
OODA loop. Third, it provides means to effect orders onto the
situation, supporting the act part of the OODA loop.
B. Integration of technologies into R ESPONSORIA through
service discovery
As mentioned above, the integration of different technologies through different services is a key factor in creating a
viable crisis management system. This integration may be
performed in different ways:
1) Manual integration. In its simplest form we are able
to integrate technologies and services by just adding
URLs. This may even be performed by users, either by
individually or together. In essence a wiki-type interface
is created in which the users construct the application
in concert.
2) Automatic integration. While manual integration certainly is possible, automatic is preferable. One of the
most important reasons for automatic integration is the
labor savings it creates. To obtain automatic integration,
we propose the use of service-discovery systems such
The R ESPONSORIA model, especially the web-based user
interface, can benefit from both manual and the automatic
integration. Today’s web browsers are capable of displaying
and utilizing a wide range of media and technologies out of the
box. In our solution we have focused on technologies built into
the browser such as JavaScript, JPG, PNG, etc. Specialized
data formats may be converted using a web service.
C. Potential multimedia technologies
Since situation awareness (or orientation, as specified in
OODA) is one of the highest priorities when addressing a
crisis, we will briefly mention some resources and technologies
that may enhance this while still being easily integrated into
the main system. We will also relate these techniques to
the OODA loop. By having a web-based crisis management
system it is possible to rapidly tie in new services as they
become available.
1) Personal video streaming: One of the possible technologies that is easy to integrate into R ESPONSORIA is live
video streaming. Services such as Bambuser [11], Qik [12],
Flixwagon [13], and Kyte allow the user to broadcast live
video over the internet using their own mobile phone as
a transmitter. This means that every cellular phone is now
a potential live-coverage camera in a crisis situation. This
technology is instrumental in the observe and orient parts of
the OODA loop.
2) Online charting: Another possible technology is charting applications, for instance the chart API from Google as
shown in Figure 5, or Complan[14] as can be seen in Figure 6.
It becomes easy to integrate this type of multimedia by merely
including a URL. Apart from a rapid integration from URLs
it is also possible to convert textual data into diagrams on the
fly. These charts may be rapidly created using web services or
webpages that feature simple user-interface components, such
as drop down menus.
A possible drawback when using the simpler URL method
is that the amount of data passed to the graphing application
may cause the web server to report an error as the URL length
expands beyond the web servers’ limit. Nevertheless, it should
be noted that the simple URL method does provide a rapid and
uncomplicated way of producing charts from data.
Furthermore, the storage requirements of these diagrams are
small, due to the fact that they exist as URLs. This existence
by URL also has the added benefit of saving bandwidth and
computing time for the crisis management center since pictures
will not be served from the crisis management’s own data
center but from a third party.
Fig. 5. Chart generated from the following URL:,y&chtt=Emergency%20room%20
The emergency room load is shown for the different locations as none,
medium, or high. The size of the X indicates relative waiting time.
Fig. 6. Complan showing different tasks to be performed in a crisis scenario
and when to do them.
With regards to the OODA loop, online charting fits in
the decision part, since it provides supportive information
regarding which direction to go.
3) Online animations: Using technologies, such as OpenLazlo, enables data from formats such as XML to be converted
into for instance Flash or DHTML to be easily accessible
online [15]. Ming [16] is a similar framework that generates
Flash on the fly.
In this section, we discuss the redundancy feature of
R ESPONSORIA and the service-discovery mechanism that facilitates the integration of different technologies, media, and
A. Redundancy
There are several layers of redundancy in our model. As
seen in part A of Figure 2, even though the web server is the
weakest link in the concept, the server side can be hardened
through off-the-shelf web-server technology solutions, such as
backup servers that automatically engage if the main server
fails and other JavaEE features [17]. Part B of Figure 2
shows an example of the application servers. It is very likely
that there will be a surplus of servers offering similar if not
identical services. Through the use of service-discovery such
as M AGUBI, rapid recovery is ensured if services fail.
B. Service Discovery
The technical aspects of service discovery through the use
of the custom-built application M AGUBI has been mentioned
in Section VI. Here, we will address the non-technical part of
M AGUBI, namely its philosophical underpinnings. M AGUBI is
a service discovery model and implementation that addresses
service discovery from the perspective of the user rather than
the system. This perspective connotes an attempt to address
the issue with discovery and selection of services.
Many traditional service-discovery systems only address the
discovery part of service discovery. On one hand this focus
helps the user since services are discovered. On the other hand
it leaves the user wanting when it comes how to perform
service selection since the user has to do the evaluation
and selection manually. With M AGUBI, this evaluation and
selection is offloaded from the user onto the service-discovery
system. Furthermore, since M AGUBI has information about the
world, situation, and the users, it is able to perform proactive
suggestions in terms of services based on what it deems
necessary at the present time.
C. Multimedia
As mentioned above, different multimedia services support
different parts of the OODA loop. A service provider may aid
in the configuration of an interface by providing information
in the service descriptions about where in the OODA loop
his/her particular service fits in. M AGUBI supports automatic
classification into the different OODA categories depending on
which rules are entered. By having this ability, we believe that
greater efficiency is achieved regarding where in the GUI to
position available services as well as supporting which services
to include in the GUI in the first place.
We have presented an approach to C2 systems for interorganizational cooperation at the local-community level. The
prototype system is a web application, which means that it
does not require client installation. The use of web browsers
running on standard hardware makes it highly available to
users in crisis situations. Furthermore, this approach enables
relatively straightforward incorporation of rich multimedia into
the C2 system as well as mash-ups of multimedia components.
The incorporation of multimedia components can be done in
different ways—both manually and automatically.
A service-discovery system can potentially facilitate the
automatic discovery and inclusion of services by using knowledge about the situation and the services available, as well as
general world information. The prospect of proactive inclusion
of services and multimedia through the service-discovery
system is appealing. We believe that the service-discovery
view of multimedia mash-ups, combined with rapid inclusion
and dismissal of actors and services, can be used to develop
new types of dynamic C2 systems. Moreover, we believe that
it is important for the C2 system to be aware of the general
C2 method used (for instance the OODA and DOODA loops)
and to provide focused support for the different stages of the
decision-making process.
This work has become possible due to grants from the
Swedish Emergency Management Agency (KBM). We thank
Ola Leifler and Jiri Trnka for valuable discussions and suggestions for improving the manuscript.
[1] G. D. Haddow and J. A. Bullock, Introduction to Emergency Management. Butterworth-Heinemann, Boston, MA., 2006.
[2] D. Mendonça, T. Jefferson, and J. Harrald, “Collaborative adhocracies
and mix-and-match technologies in emergency management,” Communications of the ACM, vol. 50, no. 3, pp. 44–49, 2007.
[3] E. Jungert, N. Hallberg, and A. Hunstad, “A service-based command and
control systems architecture for crisis management,” The International
Journal of Emergency Management, vol. 3, no. 2, pp. 131–148, 2006.
[4] S. Y. Shen and M. J. Shaw, “Managing coordination in emergency
response systems with information technologies,” in Proceedings of the
Tenth Americas Conference on Information Systems, New York, NY,
USA, 2004.
[5] G. T. Hammond, The Mind of War: John Boyd and American Security.
Washington D.C., U.S.A.: Smithsonian Institution Press, 2001.
[6] B. Brehmer, “The dynamic ooda loop: Amalgamating boyd’s ooda loop
and the cybernetic approach to command and control,” in 10 th International Command and Control Research and Technology Symposium,
McLean, Virginia, U.S.A., 2005.
[7] M. Ingmarsson, H. Eriksson, and N. Hallberg, “Exploring development
of service-oriented c2 systems for emergency response,” in Proceedings
of the 6th International ISCRAM Conference – Gothenburg, Sweden,
J. Landgren, U. Nulden, and B. V. de Walle, Eds., May 2009.
[8] S. Pilemalm and N. Hallberg, “Exploring service-oriented c2 support for
emergency response for local communities,” in Proceedings of ISCRAM
2008, Washington DC, 2008.
[9] M. Ingmarsson, Modelling User Tasks and Intentions for Service Discovery in Ubiquitous Computing. Ph. Lic. Thesis, Linköpings universitet,
[10] A. Larsson and M. Ingmarsson, “Ubiquitous information access through
distributed user interfaces and ontology based service discovery,” in
Multi-User and Ubiquitous User Interfaces at MU3I-06, A. Butz,
C. Kray, A. Krüger, and C. Schwesig, Eds., 2006.
[11] (2009, 03). [Online]. Available:
[12] (2009, 03). [Online]. Available:
[13] (2009, 03). [Online]. Available:
[14] O. Leifler, “Combining Technical and Human-Centered Strategies for
Decision Support in Command and Control — The ComPlan Approach,”
in Proceedings of the 5th International Conference on Information
Systems for Crisis Response and Management, May 2008.
[15] (2009, 04). [Online]. Available:
[16] (2009, 04). [Online]. Available:
[17] (2009, 03). [Online]. Available:
An Analysis of Two Cooperative Caching Techniques for Streaming Media in
Residential Neighborhoods
Shahram Ghandeharizadeh, Shahin Shayandeh, Yasser Altowim
[email protected], [email protected], [email protected]
Computer Science Department
University of Southern California
Los Angeles, California 90089
Domical is a recently introduced cooperative caching
technique for streaming media (audio and video clips) in
wireless home networks. It employs asymmetry of the available link bandwidths to control placement of data across the
caches of different devices. A key research question is what
are the merits of this design decision. To answer this question, we compare Domical with DCOORD, a cooperative
caching technique that ignores asymmetry of network link
bandwidths in its caching decisions. We perform a qualitative and quantitative analysis of these two techniques.
The quantitative analysis focuses on startup latency defined
as the delay incurred from when a device references a clip
to the onset of its display. Obtained results show Domical enhances this metric significantly when compared with
DCOORD inside a wireless home network. The qualitative
analysis shows DCOORD is a scalable technique that is appropriate for networks consisting of many devices. While
Domical is not appropriate for such networks, we do not
anticipate a home network to exceed more than a handful of
wireless devices.
1. Introduction
Advances in mass-storage, networking, and computing
have made streaming of continuous media, audio and video
clips, in residential neighborhoods feasible. Today, the lastmile limitation has been resolved using a variety of wired
solutions such as Cable, DSL, and fiber. Inside the home,
computers and consumer electronic devices have converged
to offer plug-n-play devices without wires. It is not uncommon to find a Plasma TV with wireless connectivity to
a DVD player, a time shifted programming device (DVR)
such as Tivo, a cable set-top box, a game console such as
Xbox, and a computer or a laptop. The primary constraint
of this home network1 is the radio range of devices and the
available network bandwidth connecting devices.
The wireless in-home networks are attributed to consumer demand for no wires, ease of deploying a wireless
network, and the inexpensive plug-n-play components that
convert existing wired devices into wireless ones. A device
might be configured with an inexpensive2 magnetic disk
drive and provide hybrid functionalities. For example, a
cable box might be accompanied with a magnetic disk drive
and provide DVR functionalities [8]. A device may use its
storage to cache content.
DCOORD [1] and Domical [5] are two cooperative
caching techniques for residential neighborhoods. While
DCOORD is designed for home gateways in a neighborhood, Domical targets devices inside the wireless home. A
qualitative comparison of these two techniques is shown in
Table 1. This table shows DCOORD assumes abundant network bandwidth and employs a decentralized hash table to
scale to hundreds and thousands of home gateways in a residential neighborhood. Domical, on the other hand, targets
an in-home network consisting of a hand-full of devices.
Both DCOORD and Domical partition the available storage space of a device into two areas: a) private space, and b)
cache space. The private space is for use by the client’s applications. Both techniques manage the cache space of participating devices and their contents. A parameter, α, controls what fraction of cache space is managed in a greedy
manner. When α=0, the device is fully cooperative by contributing all of its cache space for collaboration with other
devices. When α=1, the device acts greedy by using a technique such as LRU or DYNSimple [4] to enhance a local
optimization metric such as cache hit rate. Both DCOORD
and Domical support these extreme and intermediate α values.
1 Power becomes a constraint when a mobile device is removed from
the network for use outside the home.
2 The cost per Gigabyte of magnetic disk is less than 10 cents for 1.5
Terabyte disk drives.
Data availability (%)
Average startup latency (δ)
1.a) Startup latency
1.b) Data availability
Figure 1. Different α values, Domical Vs DCOORD, U K1X, μ = 0.73,
DCOORD and Domical have different objectives. While
Domical strives to minimize the likelihood of bottleneck
link formation in a wireless network, DCOORD strives to
maximize both the cache hit rate of each node and the number of unique clips stored across the nodes of a cooperative group. In addition, their design decisions are different. DCOORD caches data at the granularity of a clip while
Domical supports caching at the granularity of both clips
and blocks. (Section 2 shows block caching enhances the
startup latency observed with Domical.) Finally, DCOORD
chooses victim objects using a recency metric while Domical considers both the frequency of access to objects and
their size.
Since Domical was designed for use with a handful
of devices, it may not substitute for DCOORD outside
the home when the neighborhood consists of hundreds of
household. This raises the following interesting question:
Is it possible for DCOORD to substitute for Domical inside
a wireless home? The short answer is a ”No” because of
the asymmetric bandwidth of the wireless links between devices. To elaborate, a recent study [7] analyzed deployment
of six wireless devices in different homes in United States
and England. It made two key observations. First, the bandwidth of wireless connections between devices is asymmetric. Second, this study observed that an ad hoc communication provides a higher bandwidth when compared with a
deployment that employs an access point because it avoids
the use of low bandwidth connection(s).
The primary contribution of this study is to quantify the
merits of a cooperative caching technique such as Domical
that controls placement of data across devices by considering the asymmetry of their wireless link bandwidths. We
use DCOORD as a comparison yard-stick because it is the
Limited Network Bandwidth
Employs object size
Data granularity
= 0.5
Table 1. A qualitative analysis.
only cooperative caching technique that is comparable to
Domical. Obtained results show Domical enhances startup
latency observed by different devices significantly. This implies that for a wireless home network an appropriate cooperative caching technique should consider bandwidth configurations between different devices.
A secondary contribution is to highlight caching of data
at the granularity of block when network bandwidth and
storage are abundant. With a cooperative technique such as
Domical, block (instead of clip) caching enhances startup
To the best of our knowledge no study quantifies the performance of two different cooperative caching techniques
for streaming media in a wireless home network. Due to
lack of space, we have eliminated a discussion of other cooperative caching techniques and refer the interested reader
to [6] for this survey.
The rest of this paper is organized as follows. Section 2
provides a quantitative comparison of these two techniques.
We conclude with future research directions in Section 3.
2. A simulation study
When one compares Domical and DCOORD, the following natural questions arise: Is it possible for DCOORD to
Percentage improvement (%)
Percentage improvement (%)
0 0.1
2.a) U K1X
2.b) μ = 0.73
Figure 2. Percentage improvement in startup latency by Domical in comparison with DCOORD, α=0.
substitute for Domical? And, if Domical is better then how
much better is it? To answer these questions, we built a simulation model of both DCOORD and Domical. This model
assumes a household consisting of six wireless devices with
wireless network bandwidths identical to those of a United
Kingdom household reported in [7]. This household is denoted as UK1X. We scale down the link bandwidths by a
factor of 2 and 4 to construct two hypothetical households,
UK0.5X and UK0.25X.
We assumed a heterogeneous repository consisting of
864 clips. All are video clips belonging to two media types
with display bandwidth requirements of 2 and 4 Mbps. The
432 clips that constitute each media type are evenly divided
into those with a display time of 30, 60, and 120 minutes.
The total repository size, SDB , is fixed at 1.29 Terabytes.
Each device is configured with the same amount of cache
space and the total size of this cache in the network is ST .
In our experiments, we manipulate the value of ST by reT
porting the ratio SSDB
We use a Zipf-like distribution [2] with mean of μ to generate requests for different clips. One node in the system is
designated to admit requests in the network by reserving
link bandwidth on behalf of a stream. This node, denoted
Nadmit , implements the Ford-Fulkerson algorithm [3] to reserve link bandwidths. When there are multiple paths available, Nadmit chooses the path to minimize startup latency.
The simulator conducts ten thousand rounds. In each
round, we select nodes one at a time in a round-robbin manner, ensuring that every node has a chance to be the first
to stream a clip in the network. A node (say N1 ) references a clip using a random number generator conditioned
by the assumed Zipf-like distribution. If this clip resides in
N1 ’s local storage then its display incurs a zero startup latency. Otherwise, N1 identifies those nodes containing its
referenced clips, termed candidate servers. Next, it con-
tacts Nadmit to reserve a path from one of the candidate
servers. Nadmit provides N1 with the amount of reserved
bandwidth, the paths it must utilize, and how long it must
wait prior to streaming the clip. This delay is the incurred
startup latency.
Performance results: Figure 1.a shows the average startup
latency with Domical and DCOORD as a function of different α values. When compared with one another, Domical
enhances average startup latency by approximately 40% to
50%. It is interesting to note that Domical results in higher
availability of data for α values less than 0.6, see Figure 1.b.
This means the dependencies between the caches of different devices (constructed by Domical) is effective in maximizing the number of unique clips in the home network.
With α = 1, DCOORD provides a higher availability because (a) it employs a hash function to assign clips to nodes,
and (b) when a clip assigned to Ni is referenced by a neighboring device, Ni places this clip as the next to be evicted
from Ni ’s local storage. Such a mechanism does not exist
with Domical.
Domical provides a lower startup latency than DCOORD
because it assigns the frequently accessed clips to the device
with the highest out-going link bandwidths. This minimizes
the formation of bottleneck links in the wireless network,
reducing the possibility of a device waiting for an active
display of a clip to end.
In almost all our experiments, Domical outperforms
DCOORD. In Figure 2.a, we show the percentage improvement in startup latency observed by Domical when compared with DCOORD for different distributions of access
to clips and α = 0 using the network bandwidth observed
from the UK household of [7]. In this experiment, we
vary the total cache size (ST ) on the x-axis. Even with
an access distribution that resembles a uniform distribution
(μ = 0.25), Domical outperforms DCOORD because it ma-
Percentage improvement (%)
Percentage improvement (%)
μ = 0.973
μ = 0.73
μ = 0.25
S /S
3.a μ = 0.73
3.b U K1X
Figure 3. Percentage improvement with block-based caching when compared with clip-based caching
using Domical.
terializes a larger number of unique clips across the cooperative cache.
The bandwidth of the wireless links has an impact on
the margin of improvement provided by Domical. This is
shown in Figure 2.b where we analyze the impact of scaling down wireless link bandwidths: Factor of two and four
relative to the original observed link bandwidths, termed
UK0.5x and UK0.25X, respectively. The percentage improvement observed by Domical drops because the bandwidth of wireless links are so low that formation of bottlenecks is very high.
One may improve the startup latencies observed with
Domical by changing the granularity of caching from clip
to block. This is because Domical pre-stages the first few
blocks of different clips across the network strategically
in order to minimize the startup latency. This is shown
in Figure 3 where we report on the percentage improvement observed with block caching when compared with clip
caching (for Domical). Note that when either the available
cache space or bandwidth of wireless network connections
ratios in Figure 3.a with UK0.25X),
is scarce (low SSDB
caching at the granularity of a clip is the right choice. This
is because, with block-based caching, the remainder of each
clip referenced by every device may involve the infrastructure outside the home, exhausting the wireless network
bandwidth of the home gateway.
3. Conclusion
The asymmetric and limited bandwidth of wireless connections between devices in a household make a compelling
case for a cooperative caching technique such as Domical.
This is because Domical assigns data to the available cache
space of different devices with the objective to minimize the
likelihood of bottleneck links in the network. In this paper,
we did a qualitative and quantitative comparison of Domical
with DCOORD. The qualitative analysis shows Domical is
not a substitute for DCOORD outside the home. The quantitative analysis shows Domical enhances average startup
latency significantly when compared with DCOORD inside
the home.
[1] H. Bahn. A Shared Cache Solution for the Home Internet Gateway. IEEE Transactions on Consumer Electronics,
50(1):168–172, February 2004.
[2] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web
Caching and Zipf-like Distributions: Evidence and Implications. In Proceedings of INFOCOM, pages 126–134, 1999.
[3] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, editors. Introduction to Algorithms, chapter 26.2. MIT Press, 2001.
[4] S. Ghandeharizadeh and S. Shayandeh. Greedy Cache Management Techniques for Mobile Devices. In Proceedings of
ICDE, pages 39–48, April 2007.
[5] S. Ghandeharizadeh and S. Shayandeh. Domical Cooperative
Caching: A Novel Caching Technique for Streaming Media
in Wireless Home Networks. In Proceedings of SEDE, pages
274–279, June 2008.
[6] S. Ghandeharizadeh, S. Shayandeh, and Y. Altowim. An
Analysis of Two Cooperative Caching Techniques for Streaming Media in Residential Neighborhoods. Technical Report
2009-02, USC Database Laboratory, Los Angeles, CA, 2009.
[7] K. Papagiannaki, M. Yarvis, and W. S. Conner. Experimental
characterization of home wireless networks and design implications. In Proceedings of INFOCOM, pages 1–13, April
[8] J. R. Quain. Cable Without a Cable Box, and TV Shows Without a TV. The New York Times, Technology Section, July 26
PopCon monitoring: web application for detailed
real-time database transaction monitoring
Ignas Butėnas∗ , Salvatore Di Guida† , Michele de Gruttola† , Vincenzo Innocente† , Antonio Pierro‡ ,
∗ Vilnius
University, 3 Universiteto St, LT-01513 Vilnius, Lithuania
† CERN Geneva 23, CH-1211, Switzerland
‡ INFN-Bari - Bari University, Via Orabona 4, Bari 70126, Italy
Abstract—The physicists who work in the CMS experiment
at the CERN LHC need to access a wide range of data coming
from different sources whose information is stored in different
Oracle-based databases, allocated in different servers. In this
scenario, the task of monitoring different databases is a crucial
database administration issue, since different information may
be required depending on different users’ tasks such as data
transfer, inspection, planning and security issues. We present
here a web application based on Python web framework, AJAJ
scripts and Python modules for data mining purposes.
To customize the GUI we record traces of user interactions
that are used to build use case models.
In addition the application detects errors in database transactions (for example identify any mistake made by user, application
failure, unexpected network shutdown or Structured Query
Language (SQL) statement error) and provides warning messages
from the different users’ perspectives.
In the CMS experiment[1] [2], heterogeneous resources and
data are put together in different Oracle-based databases, and
made available to users for a variety of different applications,
such as the calibration of the various subdetector components
and the reconstruction of all physical quantities.
In this complex environment it is absolutely necessary to
monitor Database Resources and every application which performs database transactions in order to detect faulty situations,
contract violations and user-defined events.
PopCon monitoring(Populator of Condition Objects monitoring) is an Open Source web based service implemented in
Python, and designed for a heterogeneous database server, that
performs data transfers to provide both fabric and application
It promotes the adoption of the Standard web technologies,
service interfaces, protocols and data models.
One of the main challenges for CMS users is to monitor
their own database transactions. Moreover, different types of
users need different data aggregation views depending on
their role. To provide a first solution for such requirements, a
new group level data aggregation, based on use case models,
provided by a recorded user interaction sequence, has been
recently added to PopCon monitoring.
The organization of this paper is the following: section 2
presents the PopCon tool[3] and its main features, section 3
presents PopCon monitoring architecture and features, section
4 explains how PopCon monitoring allows users, according
to their previous record user interaction, to monitor their
resources and applications, finally section 5 sums up the
PopCon[3] (Populator of Condition Objects tool) is an
application package fully integrated in the overall CMS
framework[4] intended to store, transfer and retrieve data using
A proper reconstruction of physical quantities needs data
which do not come from collision events of the CMS experiment: these “non event” data (Condition data), therefore, are
stored in ORACLE Databases.
The condition data can be roughly divided in two groups:
conditions from any lower case detector system describing its
state (gas values, high low voltages, magnetic field, currents
and so on), and constants of calibrations of the single CMS
sub-detector devices, mainly evaluated in the offline analysis
(pedestals, offsets, noises, constants of alignment).
CMS relies on three ORACLE databases for the condition
• OMDS (Online Master Database System), a pure relation
database hosting online condition data from the various
CMS sub-detectors;
• ORCON (Offline Reconstruction Condition DB Online
System), an object-oriented database hosting conditions
and calibrations needed for the high level trigger and offline event reconstruction, populated using POOL-ORA1
• ORCOFF (Offline Reconstruction Condition Database
Offline System), a master copy of Orcon in the CERN
network through ORACLE streaming.
Calibration and Condition data coming from the subdetectors’ computers, from network devices and from different sources (databases, ASCII files, ROOT2 files, etc.) are
packed as C++ objects and moved to the Online condition
database (ORCON) via a dedicated software package called
PopCon. The data are then automatically streamed to the
offline database (ORCOFF) and become accessible in the
offline network as C++ objects. All these database transactions
generate logs which are stored in tables of a dedicated account
1 POOL is the common persistency framework for physics applications at
the LHC.
2 ROOT is an object-oriented program and library developed by CERN and
designed for particle physics data analysis.
on CMS databases, so that every transaction is traceable to a
single user.
Even without LHC[5] beams, expected for the autumn of
this year, this mechanism was intensively and successfully
used during 2008 tests with cosmic rays and now it is under
further development. Up to now, 0.5 TB of data per year have
been stored into the CMS Condition Databases.
PopCon monitoring is structured in five main components
(see Figure 1):
Fig. 3. PopCon Activity History: with the help of the mouse, users can
interact directly with the chart (there are different types of them). Users can
point the cursor to the part of chart and see the information about transactions.
Charts display the accounts on which transactions were done, date and time
of it and the occurences. In this picture there is an example of the linear chart.
A. PopCon API DB Interface
The PopCon API DB Interface is a Python script that gives
access to the PopCon account on the Oracle Database. This
component uses the cx_Oracle python module to connect
to Oracle DBs and call various PL/SQL package methods.
B. PopCon user Interaction Recorder
Fig. 1.
PopCon monitoring Architecture
the PopCon API DB Interface retrieves the entities
monitored by PopCon tool;
the PopCon user Interaction Recorder is a collection
that retains an interaction history by each user.
the PopCon data-mining extracts patterns from data,
entities monitored by PopCon tool and the history of
recorded user interactions, hence transforming them into
information such as warnings, errors or alarms according
to use case models.
the PopCon info collector aggregates the information
produced by the different database transactions and the
history of recorded user interactions, and encodes them
in JSON3 format.
the PopCon Web Interface displays the information
about the database transactions from the different user
perspectives, organizing data in tables (see Figure 2)
and/or charts (see Figure 3).
Fig. 2. The PopCon web interface represents information about database
transactions in different types: both charts and tables. A user can easily add
or remove columns by clicking the checkbox and also columns can be sorted.
Information could be grouped according to different filters.
3 JSON (JavaScript Object Notation) is a lightweight data-interchange
This component creates and makes accessible the records of
activities made by each user. Collected records are used to implement and improve a web interface, which can be designed
for information browsing for different users in different ways.
This component interacts with, and receives information from
the PopCon Web Interface.
C. PopCon data-mining
Through the use of sophisticated algorithms this component
can extract information from logs of database transactions
(operator, data source, date and time, metadata) and the
PopCon User Interaction Recorder (sequence of actions to get
to the right contents, average time on each page to compute
the attention applied by the visitor) finding existing patterns
in data.
1) Algorithm used scanning the history of recorded user
interactions.: This algorithm iterates two main steps.
The first step, called harvesting user interaction statistics,
records the following list of measurements subdivided into two
• tracks of the browsed page, like most requested pages,
least requested pages, most accessed directory, average
Time on Page, average Time on Site, ordered sequence
of visited pages, new versus returning visitors (by means
of cookies) and the number of views per each page.
• tracks of user activity at the page level:
– Changing attributes of graphical elements: (e.g.
changing charts representation from line chart to pie
chart or histogram chart, sorting and filtering data in
a table)
– Removing/adding object elements (e.g. remove/add
columns to the table)
The second step, called grouping attributes of user interaction with significant correlation, gathers in different subgroups
the tracking user activity and tracking browsed page that have
similar attributes, like most accessed directory and common
graphics elements, in order to create mutually exclusive collections of user interactions sharing similar attributes.
To reach this goal, we use an algorithm handling mathematical and statistical calculations, such as probability and
standard deviation, to uncover trends and correlations among
the attributes of the user interaction.
For example, after scanning the history of recorded user
interactions, an association rule “the user that visits page one
also visits page two and chooses to see histogram reports
(90%)” states that nine out of ten users that visit the page
one also visit the page two and prefer to see the bar chart. We
can buid use case models, based on these statistics, in order to
reflect the requirements and the needs of each user. As a result,
the user, classified under this use case, will take advantage to
see a web interface based on his perspective, helping him to
find and manage the information he needs more quickly.
2) Algorithm used to scan the PopCon logs.: PopCon is
integrated within the CMSSW framework which depends on
different tools like POOL and CORAL4 and on database
software like ORACLE and SQLite. This application can be
used in two different ways:
• since it is integrated in the framework, users can write
python scripts which are executed by the framework
executable cmsRun.
• the framework itself provides an application which, using
PopCon libraries, allows the exportation of data into the
offline database.
These applications are responsible for maintaining and handling operations which are related to database transactions.
In this scenario, it is very difficult to catch all error messages
coming from different heterogeneous resources. Therefore, we
follow this strategy: every application provides an error output
consisting of three components: the name of application, the
error code, that is unique for each tool, and the description of
the error itself. So, PopCon developers can clearly understand
what is wrong with their tool, while the end-user is able to
check if the data exportation (database transaction) they want
to perform was successful or not.
This error metric, for each tool, is provided by the framework developers in XML format in order to make it independent from the message sent to stdout and/or stderr.
Besides describing what the error is and how it occurred,
most error messages provide advice about how to correct the
To help both users and developers to classify correctly the
observed damage, the error messages are defined by the level
of issue with a different colour. These levels are:
• Fatal. The program cannot continue (red colour).
• Major (Error). The program has suffered a loss of functionality, but it continues to run (orange colour).
4 CORAL is a software toolkit (which is part of the LCG Persistency
Framework) providing the set of software deliverables of the ”Database Access
and Distribution” work package of the POOL project.
Minor (Warn). There is a malfunction that is a nuisance,
but it does not interfere with the program’s operation
(deep green colour).
• Informational. Not an error, this is related information
that may be useful for troubleshooting (green colour).
As further example, we describe another kind of error
not depending on the particular application, but on Hardware/Software/Network problems. To discover this kind of
error, we perform a time series analysis on database transactions associated with the discovery and use of patterns such as
periodicity. Since dates and times of the database transactions
are recorded along with the users information, the data can be
easily aggregated into various forms equally spaced in time.
For example, for a specific account the granularity of database
transactions could be hourly and for other account could be
daily. This information allows to discover two main kinds of
• Scanning the entities monitored by PopCon (logs of
database transactions), the association rule “during a long
period, a specific user performs a database transaction
at regular time intervals” states that, probably, if these
regular intervals suddenly change without a monitored
interaction by an administrator, and, for particular cases,
by the user, there can be network connectivity problems,
or machine failures on the network. In details, if the system finds an exception to this pattern in data, it triggers an
action to inform a user about possible problems by email.
Besides, the web user interface provides red/orange/green
alarms, according to the seriousness of the problem, so
that this exception is immediately visible by the user.
• Taking the size of data together with the periodicity of
database data transactions we can forecast the rate at
which disk capacity is being filled in order to prevent
a disk becoming full, alerting the database manager and
the administrators of the machines dedicated to the data
exportation some days in advance.
D. PopCon Info Collector
The PopCon Info Collector retrieves data from the PopCon
API DB and the PopCon User Interaction Recorder. This
component interacts with PopCon Data-mining to find existing
patterns in data previously taken, and, finally, encodes them in
JSON format, providing the result to the PopCon Web Interface
(see figure 1).
E. PopCon Web Interface
The system has a front-end Apache server and backend application servers. The PopCon Web Interface is an
application created with a Python-based framework using
Cheetahtemplate engine to structure the web site. The
PopCon Web Interface is built on the CherryPy framework
application server, which runs behind Apache providing security module to automatically show a role-optimized view of
the system and its controls. A set of reusable components,
known as “widgets”, are being made available. These are
usually built using the jQuery libraries and are written in
CSS and JavaScript. Where possible, these are reused in order
to provide identical functionality across direct components,
so that a user feels comfortable with a standard style sheet
for all web tools. The services run on a fairly standard
configuration: a pair of Apache servers working as a load
balanced proxy in front of many application servers. The front
end servers are accessible to the outside world, while the back
end machines are firewalled off from remote access[6]. With
this infrastructure we can minimize problems related with
security issues: in particular, each user is unable to handle
database objects. Thanks to AJAJ5 we can provide real-time
feedback to our users exploiting server-side validation scripts,
and eliminate the need for redundant page reload that is
necessary when the pages change. In fact, this component
allows to send requests asynchronously and load data from
the server. The PopCon Web Interface uses a programming
model with display and events. These events are user actions:
they call functions associated to elements of the web page
and then actions are recorded by the PopCon user Interaction
Recorder. The contents of pages coming from different parts
of the application are extracted from JSON files provided by
the PopCon Info Collector.
The design of the presentation of the data collected by
PopCon monitoring is based on the requirements given by
different types of users, each of them having to do with a
different abstraction level of a Database administration issue:
the ORACLE Database Administrator level, the central CMS
detector level, the CMS sub-detector level and the End-User
• The ORACLE Database Administrator may wish to face
up to databases security issues for which he is responsible. Typical example that can be detected:
– people on the inside (using PopCon tool) and outside
(using PopCon Web Interface) network who can
access and what these users do;
– programs accessing a database concurrently in order
to avoid further multiple access to the same account;
– if all such processing leave the database or data store
in a consistent state;
– illegal entries by hackers;
– malicious activities such as stealing content of
– data corruption resulting from power loss or surge;
– physical damage to equipment;
• The central CMS detector manager and the PopCon
tool developer may require the possibility of analysing
the behaviour of their applications for each CMS subdetector.
• The sub-detector CMS manager may require the possibility to analyse the behaviour of his transactions on his
own sub-detector database account.
The End-User may require the possibility to analyse the
behaviour of his own personal transaction such as size and
rate/duration of the transactions, or detect fault situations
related to insufficient password strength or inappropriate
access to critical data such as metadata.
To summarize, PopCon monitoring automatically detects the
cookies installed in each user’s browser and this information
is used to match the user with a role (Oracle Database
Administrator, PopCon tool developer, sub-detector CMS manager, End-User) in order to provide a customized report that
allows each user to have a customized printout of information
depending on his needs.
The use of data mining techniques to extract patterns from
logs of database transactions (operator, date and time) and
the history of recorded user interactions has some general
advantages. The storage of these patterns will help the user
to read and understand quickly the current situation without
going through several pages and use the search fields.
Although the number of samples analysed here is limited,
the applied approach demonstrates that our open source application is dynamic since it can work and parse the different
types of data for which date is a primary key.
Date can be written in many different ways because of
flexible Python functions which work with date and parses
Another important feature of this application is that the
PopCon User Interaction Recorder could be used in combination with PopCon data-mining to provide almost the same
functionality in general for any application. It’s indeed a
flexible part which helps to collect and interpret information
about user activities, and the actions made while he handles
the application. This information can also be used to provide
new and comfortable features for users, as we are using it to
adapt the PopCon Web Interface to the user’s needs.
[1] The CMS Collaboration. CMS Physics TDR, Volume I: Detector
Performance and Software. Technical Report CERN-LHCC-2006-001;
CMSTDR-008-1, CERN, Geneva, 2006.
[2] The CMS Collaboration. CMS Physics Technical Design Report, Volume
II: Physics Performance. J. Phys. G, 34(6):995–1579, 2007.
[3] PopCon (Populator of Condition Objects). First experience in operating
the population of the “condition database” for the CMS experiment.
International Conference on Computing in High Energy and Nuclear
Physics, March 2009
[4] CMS Computing TDR, CERN-LHCC-2005-023,
record/838359 20 June 2005.
[5] The LHC Project. LHC Design Report, Volume I: the LHC Main Ring.
Technical Report CERN-2004-003-V-1, CERN, Geneva, 2004.
[6] CMS conditions database web application service. International Conference on Computing in High Energy and Nuclear Physics, March 2009
Asynchronous Javascript and JSON.
Using MPEG-21 to repurpose, distribute and
protect news/NewsML information
Pierfrancesco Bellini, Ivan Bruno, Paolo Nesi
DISIT-DSI, Distributed Systems and Internet Technology Lab
Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Firenze, Italy, [email protected], [email protected]
Moreover, frequently the news contain videos and
images, while solution proposed by NewsML of
zipping the file constrains the users to unzip the files in
some directory to access and play the video. In
addition, news contain frequently sensitive data for
which protection of IPR (intellectual property rights) is
needed. Thus, most of the above mentioned formats
present a number of problems such as limitations
related to the adopted packaging format. For example,
such as the NewsML limitations on the packaging so
as to prevent from playing effectively video content
from the package without decompressing and/or
unpacking, and limitations on the protection and
preservation of the IPR (intellectual property rights).
Such problems are related to the file format and
protection support including certification, content
signature and licensing.
Among the formats mentioned, the AXMEDIS
implementation of the MPEG-21 file format and MXF
supports the direct play. Only the MPEG-21 also
support a range of business and transaction models via
a DRM (Digital Rights Management) solution and with
a set of technological protection supports.
In this paper, a solution to solve the above
mentioned problems of news modeling, massive
production and processing and distribution is
presented. The solution proposed is based on
AXMEDIS content model and processing GRID
platform, AXCP. AXCP provides a set of technical
solutions and tools to allow the automation of cross
media content processing, production, packaging,
protection and distribution. AXMEDIS multimedia
processing can cope with a large number of formats
including MPEG-21 and it can work with a
multichannel architecture for the production of content
on demand [3]. AXMEDIS is a framework that has
been funded by the European Commission and it has
been developed
by many partners including:
University of Florence, HP, EUTELSAT, TISCALI,
EPFL, FHGIGD, BBC, AFI, University Pompeo
Fabra, University of Leeds, STRATEGICA,
EXITECH, XIM, University of Reading, etc. More
The distribution of news is a very articulated and
diffuse practice. To this end one of the most diffuse
formats for news production and distribution is the
NewsML. The management of news has some
peculiarities that could be satisfied by using MPEG-21
as container and related production tools and players.
To this end, an analysis of modeling NewsML with
MPEG-21 has been performed and reported in this
table. The work has been performed for AXMEDIS
project which is a large IST Research and
Development Integrated Project of the European
1. Introduction
At present, there is a large number of content
formats ranging from the simple files: documents,
video, images, audio, multimedia, etc., to integrated
content models for packaging such as MPEG-21 [1],
[5], SCORM, MXF, NewsML [6], SportML, etc.
resource/essences and in some cases to wrap them in a
digital container, so as to make them ready and simpler
for delivering. Among these formats the ones used for
distributing and sharing news are mainly text and XML
oriented such as NewsML of IPTC (International Press
Tele-communication Council). Recently a new version
of NewML has been proposed, the NewsML-G2 that
provides support for referencing textual news, resource
files, for paging them, while collecting metadata and
descriptors, vocabularies, etc. (
Furthermore, the news are typically massively
processed by news agencies and/or by TV news
redactions. They are not only received in NewsML
formats but also in HTML, plain TXT, PDF formats as
well. The agencies and redactions need to move,
transcode, and adapt them to different formats
processing both text and digital essences, by changing
resolution, summarizing text, adapting descriptive
metadata, etc. In some cases, the adaptation has to be
performed on demand as a result of an answer to a
query or request to a database or on a web service.
tool and algorithm (e.g. audio, video and image
adaptation, transcoding, encryption) and to cope with
possible customized algorithms and tools.
As to the processing capabilities, an AXCP Rule
formalises in its own language features to perform
activities of ingestion, query and retrieval, storage,
adaptation, extraction of descriptors, transcoding,
synchronisation, fingerprint, indexing, summarization,
metadata manipulation and mapping via XSLT,
packaging, protection and licensing in MPEG-21 and
OMA, publication and distribution via traditional
channels and P2P.
technical information, as well as how to make
registration and affiliation to AXMEDIS can be
recovered on
In order to solve the above described problems, the
AXCP solution has been augmented by semantic
processing capabilities, NewsML modeling and
conversion strategy into AXMEDIS MPEG-21 format
with the aim of preserving semantics and capabilities
of the early news files processed [4], [5]. In this case,
the MPEG-21 models and tools have been used: (i) as a
descriptor and/or a container (with AXMEDIS file
format) of information and multiple file formats, (ii) as
a vehicle to protect the IPR when the information is
distributed towards non protected channels or it
contains sensitive information.
The paper is organized as follows. In section 2, a
short overview of AXMEDIS content processing
platform for multimedia processing is reported. Section
3 refers to modeling of NewsML into MPEG-21 and
implementation details regarding the AXCP are
reported. An analysis of the advantages identified in
using the AXMEDIS model and tools are reported in
Section 5. Conclusions are drawn in Section 6.
3. From NewsML to AXMEDIS modeling
passing via MPEG-21
The NewsML has a structure at 4 nested levels (from
the contained to the smaller components):
ContentItem (
The News Component mainly contains the information
that may be used for modeling the NewsItems. At the
end the ContentItem describes the contribution in terms
of comments, classification, media type, format,
notation, etc. The NewsML has also metadata mapped
in the architecture and in particular in the
Descriptive Metadata, and Rights Metadata. The
information for the news identification are reported
into the NewsItems, each of them can be univocally
On the basis of our analysis, we have identified 6 main
entities which have to be addressed: NewsML,
NewsItem, NewsComponent, ContentItem, TopicSet,
Catalog (see Figure 1)
2. AXMEDIS Content Processing
The AXCP tool is based on a GRID infrastructure
constituted of a Rule Scheduler and several Executors
for process executing. AXCP Rules are formalized in
AXCP java script [2], [4]. The AXCP Rule Scheduler
performs the rule firing, discovering Executors and
managing possible problems. The scheduler may
receive commands (to invoke a specific rule with some
parameters) and provide reporting information (e.g.
notifications, exceptions, logs, etc…) to external
workflow and tools by means of a WEB service.
The Rule Executor receives the Rules to be
executed from the Scheduler and performs the
initialization and the launch of the Rule. During the
run, the Executor could send notifications, errors and
output messages to the Scheduler. Furthermore, the
Executor could invoke the execution of other Rules
sending a specific request to the Scheduler, in order to
divide a complex Rule/procedure into sub
rules/procedure running in parallel, thus allowing a
rational use of the computational resources accessible
in the content factory, on the GRID. This solution
maintains advantages of a unified solution and allows
enhancing the capabilities and the scalability of the
AXMEDIS Content Processing.
The AXCP processing tools are supported by a
Plugin technology which allows each AXCP Rule
Executor to link dynamically any content processing
Figure 1 – NewsML main entities
The resulting model is hierarchical and in order to be
ingested, analyzed and converted it has been replicated
into an object oriented model allowing us to represent
this model in the memory, by considering their
depicted in Figure 3. AXMEDIS view is only a more
abstract view of the AXMEDIS file format
ISOMEDIA based. The AXMEDIS mapping is more
effective and easy to understand than the underlining
MPEG-21 modeling that is fully flat and hard to be
understood by humans. The resulted MPEG-21
container of the News can be protected by using the
MPEG-21 REL and AXMEDIS tools for DRM.
relationships and roles, as in the UML diagram
reported in Figure 2.
Figure 2 – Modeling NewsML main entities for
conversion and analysis
In addition, also other classes have been
implemented to model the NewsML such as: Topic,
NewsMLDocument, NewsComponent, NewsItem also
specialised from both NewsMLElements and
ContentAttribute. The proposed model allows to ingest
quickly the NewsML structures.
The realized model allows to perform the needed
transformations on the NewsML files in an efficient
manner. For example, the extraction of a
NewsComponents removing its parts from the tree, the
addition of news, etc, together with the conversions of
the NewsML in other formats such as XML, HTML,
Text and files, and MPEG-21 as described in the
The resulted model has been also analyzed to map
the information into the MPEG-21 structure of the
DIDL (Digital Item Description Language).
AXInfo +
Dublin Core
NewsComponent AxResource
Table 1 – Mapping concepts of NewsML to
MPEG-21 view
Fig.3 – A NewsML on the AXMEDIS Editor
In Table 1, a mapping of the NewsML elements
with those of MPEG-21 and AXMEDIS is provided.
The AXMEDIS editor allows you to see both
MPEG-21 and AXMEDIS views of the newsML file as
In Figure 3, AXMEDIS view, the nesting levels of
AXMEDIS objects are evident. They can be moved or
extracted simply using drag and drop. The same
approach can be adopted to work with single
contributions: text and/or digital files (images, video,
etc.). They can be played directly into the editor and
into the AXMEDIS player. An additional feature is the
index in HTML of the converted NewsML items. It has
been automatically produced by processing the
NewsML structure in the AXCP script. That index is
an HTML file enforced into the AXMEDIS Object
(see the bottom of the tree in Figure 4).
6. Conclusions
In this paper, the analysis of the modelling NewsML
and news in genral with MPEG-21 has been performed
and presented. The results demonstrated that the
structure of the News can be quite easily modelled in
MPEG-21. In addition, the news processing consisting
in their ingestion and transcoding can be performed on
the AXCP platform in quite easy manner since now an
ingestion module of NewsML has been developed and
added. As a result, a number of advantages have been
identified and demonstrated, as reported in Section 5.
The full documentation can be recovered on the
is an open platform, which means that you can join the
AXMEDIS community. The example mentioned in
this paper is accessible from the same web portal.
4. Implementation on the AXCP GRID
The above mentioned object oriented module for
NewsML ingestion, modelling and processing has been
added to the AXCP Node engine. Therefore, a set of
functionalities, API, to access the NewsML models has
been defined and made directly accessible into the
AXCP Java Script Multimedia processing language.
The authors would like to express their thanks to all the
AXMEDIS project partners including the Expert User
Group and all the affiliated members, for their
contribution, funding and collaboration efforts. A
specific acknowledgment to EC IST for partially
funding the AXMEDIS project. A warm thanks to all
the AXMEDIS people who have helped us in starting
up the project and sorry if they might have been not
involved in the paper nor mentioned. We trust in their
5. Benefits and results
This solution based on AXCP allowed to set up
flexible automatic processes where NewsML
information is ingested and processed in a very
efficient manner, while considering any kind of
conditions and structures for repurposing them and
adapting news including text and digital essences
towards different formats: HTML, TXT, PDF, MPEG21, SMIL, etc., either integrating or not digital
essences into them and distributing them via email,
posting on FTP, on DBs, etc.
Besides, the news modeling with AXMEDIS has some
advantages, as the resulting AXMEDIS object can be:
x used as a news descriptor and/or a news container
(with AXMEDIS file format), supporting any kind
of file formats for the digital essences being
integrated into the news.
x used to manipulate the news, to add other
information via AXMEDIS Editor and to make a
directly play of the essences into the news without
extracting them from the package.
x searched into the internal body of the news object,
thus making the understanding and browsing of
complex news easier, by adding simple Intelligent
methods such as the ones described into [5]. .
x annotated conformant to MPEG-21 as described in
x IPR protected when the information is distributed
towards non protected channels or it contains
sensitive information.
x distributed in several manners and accessed via
PC, PDA, etc.
[1] MPEG Group, “Introducing MPEG-21 DID”,
[2] J. Thiele, “Embedding SpiderMonkey - best practice”
[3] P. Bellini, I. Bruno, P. Nesi, ``A language and
architecture for automating multimedia content
production on grid'', Proc. of the IEEE International
Conference on Multimedia & Expo (ICME 2006), IEEE
Press, Toronto, Canada, 9-12 July, 2006.
[4] P. Bellini, P. Nesi, D. Rogai, ''Exploiting MPEG-21 File
Format for cross media content'', Proc. Of the
International Conference on Distributed Multimedia
Systems, DMS 2007, September 6-8, 2007, San
Francisco Bay, USA, Organised by Knowledge Systems
[5] P. Bellini, I. Bruno, P. Nesi, M. Spighi, “Intelligent
Content Model based on MPEG-21”, in Proc.
AXMEDIS 2008, Florence, Italy, 17-19 Nov. 2008, pp
41-48, IEEE Press.
[6] Kodama, M.; Ozono, T.; Shintani, T.; Aosaki, Y.;
"Realizing a News Value Markup Language for News
Management Systems Using NewsML" Complex,
Intelligent and Software Intensive Systems, 2008. CISIS
2008. International Conference on 4-7 March 2008
Page(s):249 - 255
Activity-oriented Web Page Retrieval
by Reflecting Human Traffic in the Real World
Atsuo Yoshitaka*, Noriyoshi Kanki**, and Tsukasa Hirashima**
*School of Information Science
Japan Advanced Institute of Science and
1-1 Asahidai, Nomi, Ishikawa, 923-1292 Japan
**Graduate School of Engineering
Hiroshima University
1-4-1 Kagamiyama, Higashi-Hiroshima,
Hiroshima, 739-8527 Japan
or manipulated are limited and modified in advance,
RFIDs may be implanted into the object for tracking user’s
behavior or movement.
For the advanced real world oriented information
management, we believe the system should be capable of
managing information in accordance with user’s context of
activity. In the process of information management in the
real world, one of the most important features is the
method of recognizing the target of the user's scope of
interest for information indexing, filtering or retrieval[1].
With respect to information provision, the sources of
information is often a dedicated information storage
obtained by the process of information acquisition.
However, we should aware that the range, quantity and
sometimes quality of information accessible via WWW is
not negligible, and various real-world related information
is provided by individuals, shops, companies and so on.
However, most of the existing web retrieval interfaces are
not taking the context of human movement in the real
world into account. Related to this issue, some web
interfaces are proposed that project web pages onto
geographical map and let a user to access various web
pages related to shops, train stations, buildings, event halls
and so on, associated with icons on a map presented on a
mobile computer display. However, as far as we know,
there are no work that reflects users' context in the sense of
activities in the real world.
In this paper, we describe a framework that accumulates
users’ activity corresponding to places or facilities where
he/she stayed with a certain purpose, and retrieves
information related to the situation he/she is facing from
the Web. In this study, we regard WWW as the public
information storage and propose a framework of context
aware Web retrieval based on users' activity from the point
of view of traveling from one place to another. Web
contents are retrieved based on the accumulation of
activities of either a group of users or a user to be assisted.
From the point of view of accessing Web information,
non-context aware retrieval where the target of retrieval
Currently, major sources of information are not only in
the real world but also in the information space organized
by WWW on the Internet. Information acquisition and
retrieval related to the real world need to recognize user's
behavior in order to fill his/her needs. In this paper we
present behavior oriented information retrieval system
and its experimental operation. Users' activity in the real
world, i.e., trajectory projected onto geographical map
with indices of places, is tracked by GPS receivers.
Commonly and frequently observed movements by users
are detected and they are applied in the process of
evaluating the importance of information to be retrieved
that relates to places or facilities in the real world. The
proposed system assists a user to behave in the real world
in the sense of retrieving information that helps to decide
his/her subsequent actions to take.
1. Introduction
Mobile computers which are small enough with high
computing capacity became widely available. The
diffusion of these devices is one of the dominant factors
that support recent mobile computing environments. In
recent years, researchers are studying real world oriented
information management especially in mobile environment.
This kind of information management ranges from
information acquisition to information provision. One of
the directions is to provide a user with information that is
related to his/her current activity in a certain time, place
and/or occasion. Sensing the context of a user's activity is
achieved by tracking the users movement projected onto a
geographical map. Various sensors are available to capture
this kind of activities. GPS (Global Positioning System)
receiver is widely used in order to track the activity of a
user. If it is possible to assume that objects being accessed
does not reflect the user's current location or movement
may require a number of trials of refinement of specifying
keywords to submit to a search engine. On the contrary,
proposed framework, i.e., context aware retrieval
implicitly provides the search engine with additional
keywords that represents the user's expected destination of
movement as well as current location in the real world.
2. Behavior Modeling in the Real World
2.1 User Activity Model in Mobile Environment
Recently, most of the companies, shops, public places
such as city libraries, concert halls, train stations, and so
on, provide information related to themself on the Web. In
addition, portal sites on shopping, travel, cuisine,
entertainment and personal blogs are also nonnegligible
source of information describing such facilities. They often
update their Web pages and provide us up-to-date
information, and the Web contents often provide us
information that may affect decision of our activity in the
real world.
From above-mentioned point of view, we assume the
user activity model as follows in mobile environment in
this paper.
(1) A user moves from one place or facility to another in
accordance with a certain reason such as business,
travel or pleasure.
(2) During the activity, he/she retrieves information on the
candidate place or facility where he/she is going to visit
subsequently. In this process less steps of manipulation
or keywords are preferable for easiness of use.
(3) The user accesses the information on the candidate
place to visit subsequently, and makes a decision to
visit there or not, or changes the destination to visit by
referring to the information.
The idea of user activity model is illustrated in Figure 1.
This user activity model may be regarded as a general
situation in information retrieval on facilities or places to
decide one’s behavior in mobile environment. In the
following sections, we concentrate on information retrieval
following above-introduced activity model.
In modeling user’s activity, we focus on origindestination oriented movement, i.e., movement from one
place (or facility) to another, regardless of the route
between them. This is based on the assumption that his/her
subsequent action of movement is affected by his/her
current location. Attributes on the facilities or places in the
real world, such as name, postal address, or the type of
service, correspond to ‘keywords’ for information retrieval
in the case of the above-mentioned scenario. We regard
facilities where more traffic of users exists between one’s
current position to them are the ‘near’ places he/she may
Figure 1. Relation between movement and retrieval
visit as the next action, the information related to them is
more important than others for the decision of the next
destination. Note that this idea is not based on
geographical distance but logical distance based on the
frequency of human traffic between one place to another. It
means if more traffic between place A to B is observed
than that between A to C, where B is further than C from A,
we regard the information related to B is more important
than that related to C for a user whose current position in
the real world is A. This idea is different from the idea of
geographical distance based information filtering or
In the subsequent sections, we describe the framework
of information filtering based on users activity in the real
2.2 Traffic Graph
We assume a user's task of retrieving Web pages as part
of the activity in the real world. In this context, the user's
objective of retrieving Web pages is to obtain information
that is related to a facility such as a store, a train station, a
school, a city hall, and so on. It is popular that various
facilities provide public with information on timely events
or notice via Web pages. This kind of information is
valuable in the sense of deciding his/her subsequent action.
Based on this observation, we discuss detecting the
facilities where a user stayed for a certain purpose. Detail
of the criteria on detection of staying is described later in
this paper. Based on the detection of the facility where a
user stayed, we extract a user’s stay at a place to model the
traffic between facilities for context aware Web retrieval.
The basic idea is based on the observation as follows.
Assume that a person is currently staying at a place
3. Extraction of Human Behavior in the Real
3.1 Activity Tracking
(a) location based
location based
(b) traffic based
traffic based
Figure 2 Traffic Graph
Figure 2 Traffic Graph
associated with a place (i.e., facility) Fa and trying to
access Web pages in order to obtain information for the
place where he/she is going to visit subsequently. Under
the assumption where frequent traffic, i.e., users'
movement from one place to another, is observed between
Fa and Fb, we extrapolate that he is going to retrieve Web
pages related to the facility Fb.
In order to model the users’ traffic between facilities,
we introduce traffic graph. An example of traffic graph is
illustrated in Figure 2. In the figure, a node denotes a
facility (i.e., a place) in the real world. The first element in
the pair of values attached to a link represents geographic
distance between facilities and the second one represents
traffic frequency. In figure 2(a), the length of a link
corresponds to geographic distance between facilities in
the real world. On the other hand, the length of a link
corresponds to the closeness with regard to traffic between
two places to stay in figure 2(b); the more the traffic
between the facilities, the closer they are in the sense of
travel frequency.
Here we regard higher travel frequency corresponds to
higher possibility of needs of information related to the
facility where he/she is going to visit. In the above
example, we regard Web pages related to Fb and Fd are
more expected to be accessed than Fc and Fe, under the
assumption where the current location of a user demanding
Web information is Fa. The traffic graph is organized by
tracking the movement of multiple users. That is, the
history of traffic is shared by multiple users in order to
derive traffic density between facilities for evaluating
importance of Web information in the sense of the human
traffic-based relation between facilities. Based on the
traffic graph we measure the importance of Web pages that
corresponds to the context of users' activity. In the process
of organizing a traffic graph, users to share traffic history
may be grouped based on the preference of individual, and
the group is dynamically reorganized in accordance with
the transition of activity context. Privacy issue can be
avoided by anonymizing individual traffic data.
A user's position in the real world is traced based on
positioning data from GPS (Global Positioning System)
receiver. GPS system detects the current position by
evaluating the temporal difference of radio wave received
from several satellites; more radio waves are received,
more precise position is detected. That is, the error
distance between the detected position and the true
location where a GPS receiver is placed varies depending
on the radio wave condition. Since data from a GPS
receiver consists of coordinates by longitude and latitude,
the coordinates data is projected onto geographical map
with latitude-longitude index and rectangular regions
corresponding to facilities such as schools, shops,
restaurants, public halls, and so on. Each rectangular
region is associated with the description that consists of
textual description of address and the name of the
In order to organize a traffic graph, we need to extract a
place when a user came over. As stated, detected position
by GPS contains the error of distance, whose amount
depends on the condition on receiving satellite waves.
Therefore, this error needs to be taken into account to
diminish misdetection. The positioning error is generally
estimated as 2drms, where drms stands for distance root
mean square. The error in the positional data ep is
estimated by the following formula.
e p = 2drms = 2UERE × HDOP
In the above formula, UERE is the abbreviation of user
equivalent range error, which is not obtained from GPS
data. This value is determined as 2.0 assuming general
open-air condition. HDOP stands for horizontal dilution of
precision, which is obtained from GPS data. The value of
HDOP approximately ranges from 1 to 2 where the
receiver can get enough number of satellite waves. In case
where the wave condition is not satisfactory, it ranges from
7 to 9. Therefore, average positioning error in good wave
condition is approximately 6 meters, and it is
approximately 36 meters in bad condition. The user's
activity in the real world is detected not based on trajectory
obtained by GPS data, but on the places where he/she
stayed, following the user activity model described in 2.1.
3.2 Detection of Stay
As discussed in 3.1, we consider the place where a user
stayed for more than a certain duration is a distinctive
place to analyze his/her activity. Our objective is to detect
mutual strength between places in the sense of human
traffic. Therefore, not tracing user’s activity by means of
GPS coordinates themselves but detection of whether
he/she stayed at a place need to be obtained.
The state of staying at the place of a facility is detected
by taking the size of facility as well as the positioning error
into account. Based on the pre-experiment, we determined
the threshold tstay(p) with regard to the facility p for
classifying whether a user stayed at a place or not.
t stay ( p ) =
L R + 2e p
4. Situation Aware Web Retrieval
Context aware Web retrieval is performed based on the
Traffic Graph. As described, geographical map data is
prepared with the region of a facility, and each region is
associated with the name and the postal address of the
facility. It is popular that a Web page related to a facility
contains the description of the postal address as well as the
telephone number for the help of visiting. Therefore,
empirically speaking, the possibility of desired Web pages
being listed at the highest position of the ranking in the
result of retrieval increases by appending the keywords of
the name of facility and the address of it. Figure 3 shows
v walk
In the above formula, LR denotes the sum of length and
width of a minimum rectangle that covers the area of a
facility. The walking speed of a user is denoted as vwalk.
The judgment of stay is carried out as follows. First,
when the user is located at a position whose distance from
the nearest edge of the rectangular region of a facility is
less than ep, the duration of stay is started to be measured
until he/she moves away from the region. If the duration
exceeds tstay(p), his/her activity is classified into ‘stayed’ at
the facility p.
We experimented to evaluate the performance of
detection of user stay based on precision and recall. We
assumed that all the places of facilities in the route of user
traffic are defined as the part of geographic data in
advance. Let Nstay denote the number of actual staying at
places that a user made in their activity. We denote the
number of extracted stay at facilities by this method and
that of correct stay at places within the detected stay as
Nextracted and Ncorrect, respectively. The number of actual
stay at places is denoted as Nactual. The precision and recall
of detecting stay at a facility, Precisionstay and Recallstay is
represented as follows.
Precisionstay=Ncorrect / Ndetected
Recallstay=Ncorrect / Nactual
Figure33 Overview
According to the result of 30 days of experiment to
track a user’s activity, where the user was a graduate
student, the system detected 132 times of stay at facilities.
When the positioning error of GPS is not taken into
account in the process of detection, precision and recall
were 0.92 and 0.67, respectively. In case where the
positioning error is taken into account as we described, the
precision and recall were 0.88 and 0.76, respectively. As a
consequence, stay detection with GPS error adaptation
improves recall with little degradation of precision. This
performance may be improved further by taking the
direction of motion trajectory into account.
Figure 44 Displaying
Keyword representing
retrieval context
Number of Web
site to display
Node (facility)
A link to the
retrieved Web site
Figure 5 Displaying
Graph a Traffic Graph
Figure 5Traffic
the overview of the user interface of the situated aware
web browser. The system is implemented on the SONY
VGN-U71P with Visual C++. The small black module
above the PC is a GPS receiver, which is connected to the
PC via USB. Manipulation by a user is performed with a
stylus pen. Figure 4 shows all the nodes near the current
location, which correspond to facilities or places,
registered in the system. The name of the facility is
denoted in each of the nodes in Japanese letter. The scale
of the map may be changed if needed.
Figure 5 is an example of showing detected traffic
between facilities, each of the traffic is shown as a link
between facilities. The name of a facility, such as the name
of a store, a school, and so on, is displayed in a rectangle.
In this example, the current position of a user is displayed
at the center of the map.
When a user enters a keyword in the upper right text
box in Figure 5 for Web page retrieval (in the figure, “ᧄ”,
i.e., book, is specified), the name and the address of the
facilities where the traffic from/to the current position exist
are appended to the keywords that the user entered
explicitly. This process is repeated for all the facilities
where traffic to/from the current position of facility is
detected. The top n facilities (or places) in the result of the
retrieval for each of traffic are arranged in the descendant
order of the frequency of traffic, and they are presented to
the user. The result of the retrieval is displayed in the form
of list of the titles of Web page at the right of the interface
in Figure 5. Simply clicking one of the title in the list, the
Web page corresponding to it is displayed. Figure 6 shows
the difference between an ordinal retrieval by specifying a
keyword for the Google on the right and that of the
proposed method. Though they are displayed in Japanese,
Google search with a single contextual keyword such as
‘noodle shop’ returns well known portal sites that describe
noodle restaurants all over Japan, bulletin board sites on
noodle, or a link to the article of noodle in Wikipedia. On
the other hand, the retrieval result by the proposed method
only shows Web pages related to noodle restaurants where
many local inhabitants often visit, which are denoted with
In the implementation of the functionality of context
aware Web page retrieval, we post HTTP request to the
Google search engine. When a user tries to retrieve Web
pages by specifying keywords, his/her current position is
detected by matching GPS data and map data, and the
name and the address of the facility where he/she is staying
is extracted. The HTTP request to the Google search
engine is invoked by posting extracted name and address
of the facility as well as explicitly specified keywords.
This framework enables a user to retrieve Web pages
related to the facilities where frequent travel from the
current location is observed. This, in turn, provides context
aware Web retrieval reflecting human activity in the real
world, based on the idea that frequent travel between
places implies higher priority or importance of information
to be retrieved, in case of searching for information to
decide the subsequent actions.
Links to sites
describing local
noodle restaurant
In the framework, the density of human traffic from one
place to another is regarded as the strength of relation in
the sense of information as well. However, this method of
valuation may be biased with reference to the size of city,
public traffic network, which is one of the open issues to
be investigated.
When this system is operated in a large scale, traffic
history of each user will be accumulated in a mobile
computer and it is transmitted to a central sever via
wireless network in order to construct traffic graph. We
employed GPS receiver to detect the place or facility
where a user stayed. Currently, this method is a reasonable
option that can be widely operated. However, it cannot
detect exact destination in case of complex indoor field or
buildings. It might be replaced with another method such
as utilizing widely diffused RF-tags in the future.
Proposed framework enables to retrieve desired
information with less keyword to specify for accessing
search engine. Additional, situated keywords are implicitly
applied in addition to the explicit keyword given by the
user, which improves the hit-ratio and diminishes the cost
for accessing requisite Web pages.
Figure6 6 Comparing
of Retrieval
4. Related Work
The objective of context-aware web browser is to adapt
the variety of needs or purposes of the users [2-4] in
information retrieval. However, the direction of these
studies is different from one to pursue adapting
information provision with regard to a user’s activity in the
real world, i.e., the activity in the real world is not taken
into consideration as criteria in information retrieval.
Situation aware, i.e., location dependent, Web browsing
is studied in [5] and [6]. In [6], GPS signal is referred to
for acquiring position in the real world for location
dependent Web browsing. However, it does not reflect the
traffic or flow of persons in the real world in evaluating the
importance of Web information related to the human
traffic. Therefore, we classify it as static, location oriented
Web browsing, which does not take dynamic human traffic
into account. As far as we know, there is no study that
retrieves Web pages based on human traffic between
places, i.e., the context of user activity in the real world.
This work is partially supported by the Grand-in-Aid in
Scientific Research, JSPS.
[1] P. J. Brown and G. J. F. Jones, “Context-aware Retrieval:
Exploring a New Environment for Information Retrieval and
Information Filtering,” Personal and Ubiquitous Computing,
Vol. 5, Issue 4, pp. 253-263, 2001.
[2] G. N. Prezerakos, N. D. Tselikas, G. Cortese, “Model-driven
Composition of Context-aware Web Services Using
ContextUML and Aspects”, Proc., IEEE International
Conference on Web Services, pp. 320-329, 2007.
[3] A. Thawani, S. Gopalan, and V. Sridhar, “Web-based
Context Aware Information Retrieval in Contact Centers”,
Proc. International Conference on Web Intelligence, pp. 473476, 2004.
[4] T. Koskela, N. Kostamo, O. Kassinen, J. Ohtonen, and M.
Ylianttila, "Towards Context-Aware Mobile Web 2.0 Service
Architecture", Proc., International Conference on Mobile
Ubiquitous Computing, Systems, Services and Technologies,
pp 41-48, 2007.
5. Conclusion
[5] A. Haghighat, C. Lopes, T. Givargis, and A. Mandal,
"Location-Aware Web System", Proc., Workshop on
Building Software for Pervasive Computing, OOPSLA'04,
We described a novel framework of information
retrieval based on dynamic user's activity in the real world.
[6] D. Carboni, S. Giroux, et al., "The Web around the Corner:
Augmenting the Browser with GPS", Proc., the 13th
International WWW conference, pp. 318-319. 2004.
An Architecture for User-Centric Identity, Profiling and Reputation Services
Gennaro Costagliola, Rosario Esposito, Vittorio Fuccella, Francesco Gioviale
Department of Mathematics and Informatics
University of Salerno
{gencos,vfuccella,cescogio}, [email protected]
This paper presents a work in progress whose objective
is the definition of a novel architecture for solving several
challenges related to Web navigation, such as accessing to
multiple Web sites through a single identity and verifying
the identity and the reputation of a peer involved in a transaction. The proposed model tries to solve the above challenges in an integrated way through the introduction of a
specialized Web Mediator acting on behalf of the user during usage of the Net, identity providers for identity data centralization, and a two way negotiation system among parties
for mutual trust.
1. Introduction
The need for introducing new functionalities to improve
the user Web experience is more and more widely felt.
Lately, researchers are closely taking into account the following important issues:
1. Registering and accessing to multiple services using a
single identity for all services (single sign-on systems);
2. Verifying the identity and the reputation of a peer (user
or organization) involved in a transaction;
3. Keeping the property and control of personal information such as: user profile, reputation, etc;
In this paper we propose an architectural model aimed at
pursuing the above objectives trough the introduction of a
Web Mediator (WM) acting on behalf of the user during
Web navigation and an Identity Provider for the identity
data centralization. The former is responsible for maintaining user personal data and profile to use in content personalization (as similarly done in [1]). The latter is responsible
for keeping user identity and reputation data, and to vouch
for the user in registration and authentication procedures.
Our model enables a two way negotiation system among
parties for mutual trust: in a transaction both parties can mutually authenticate and verify reputation and profile. This
sort of handshake, will allow them to decide whether the
transaction can go on or should stop. It is worth noting that
despite adding new functionalities to the actual Web application interactions, the architecture works with the actual
Web protocols.
The advantages deriving from the availability of a solution to the three issues mentioned before are evident in several scenarios occurring daily during Web navigation. For
instance, mutual trust is useful in the detection of phishing:
let us suppose a user receives an e-mail containing a link to
an important document about his/her bank account stored
on the bank Web site. By connecting to the link with our
framework enabled, the user can both check whether the
remote Web server supports the architecture and verify its
credentials. The phishing attempt can be immediately detected in the former case and after a reputation check in the
latter case. The availability of user profile and reputation is
useful in many cases: i.e., profile is used for offering personalized services, reputation in on-line auction services.
Their availability to the user is advantageous since: data are
already available when a user starts requesting a service at
a new provider (it is not necessary to wait for a new profile
or reputation to be built); the user is owner of his/her personal data which can be used with different sites offering
the same services.
The above mentioned issues have been faced separately
so far, that is, to our knowledge, there are no proposals of
a generic architecture offering a solution for them all in literature. I.e., platforms for single sign-on [6] trust and reputation management [3] are available, as well as methods
for preventing phishing [5]. In order to propose a unique
solution to the above challenges, we have decided to extend a well established SSO platform, OpenID [6], with the
support of a mutual trust establishment procedure. In particular, we have extended the OpenID Authentication procedure. The interaction among user’s and peer’s modules
involved in the procedure are described through the paper.
In our prototype, the Web browser can communicate with
user’s WM through a special plug-in.
The rest of the paper is organized as follows: in section 2, we introduce the OpenID platform; the architectural
model, including a detailed description of the involved entities and their interaction model, are presented in section
3. In section 4, we will describe the implemented prototype
and its instantiation in a real-life application scenario. Final remarks and a discussion on future work conclude the
2. The OpenID Platform
OpenID was firstly developed in 2005 as a user-centric
and URI-based identity system. Its main objective was to
support the SSO functionality. The initial project has grown
and has evolved in a framework enabling the support of several functionalities which can be added to the basic platform.
The OpenID architecture components are: the user, the
remote Web-server (also know as Relying Party) where the
user wants to authenticate and the Identity Provider (IdP)
that provides vouch for user identity certification. OpenID
has a layered architecture. The lower layer is the Identifier layer. This layer provides an unique identifier for address based identity system. The address identifier (OpenID
URL) is used by the Relying Party (RP) to contact the user’s
Identity Provider and retrieve identities data. Both URL and
XRI [7] address formats are supported as identifiers.
The above layer is the service discovery layer. It is implemented trough the Yadis protocol [4]. The purpose of
this layer is to discover various type of services reachable
through an identifier. In the case of OpenID it is used to
discover the Identity Provider location.
The third layer is the OpenID Authentication. The main
purpose of this layer is to prove that an user is the owner of
an OpenID URL and, consequently, of the connected user
The fourth layer is the Data Transfer Protocol. This
protocol is used to transmit user related data from the IdP
to the RP. In OpenID Authentication 1.1 this layer is implemented through the SREG protocol (Simple Registration
Protocol), which allows the transmission of simple account
related data [2]. Currently, the OpenID research community
is defining a new version of the protocol capable to transmit
various type of data other than identities related one.
3. The architecture
In this section we give a description of the proposed architectural model, including the involved entities and their
interactions in a trusted negotiation, which is a typical interaction where two parties gradually establish trust [8]. It
Figure 1. The OpenID layered architecture.
is based on the previously described OpenID platform, and
extends it to support the features outlined in the introduction.
Our model extends the OpenID platform by enabling the
establishing of mutual trust and the exchange of reputation
and profile data between two parties. In particular, it adds
Profile and Reputation layers upon the uppermost OpenID
layers and a Mutual Trust layer above them (Fig 2).
Reputation management service is provided as an extension of the DTP layer. In particular, the data model supported in the information exchange occurring at this layer
is extended with reputation data. The discussion on how to
represent, create and manage these data are out of the socpe
of this paper and will not be treated here.
User profile data are managed by the WM, which also
works as a profile provider, and can be accessed only after
the OpenID Authentication procedure is successfully completed.
The Mutual Trust layer implements the handshake procedure that will authorize the user application to proceed
with an interaction after identity, reputation and profile of
remote peer are checked.
In a typical scenario, our architecture is composed of the
following components:
A) The Web Browser equipped with a specific plug-in
(i.e. a Firefox add-on) to communicate with the WM;
B) A Web Mediator (WM): the software module responsible to communicate with other remote peer WMs, in order
to perform a trusted negotiation. The WM can perform two
functions: issue a transaction request to remote peers WMs
or receive incoming transaction requests from remote peer
WMs. In the case it is the first to send a request will refer
to the WM as User Web Mediator (UWM); otherwise we
will refer to it as Remote WEB Mediator (RWM). More in
details, a WM, by referring to a preference table set by the
user, verifies the identity, reputation and profile of remote
peers and, after that all checks are passed, it authorizes the
application to proceed with the transaction. Furthermore,
in scenarios that needs this feature, it also checks that the
resource retrieved as a transaction result fits user’s preferences (i.e. content filters).
C) An Identity Provider (IdP), deployed on a third party
server, that is responsible for guaranteeing the veracity of
the credentials issued by the WMs; it is also responsible to
7. RWM checks the received profile and the reputation
data and, if all checks are passed, sends an OK message to UWM.
Figure 2. The proposed architecture.
Figure 3. The WM Handshake
The authentications in step 2 and 5 follow the OpenID
protocol and consist of sending username and password to
the IdP (through a POST request) to prove to be the owner
of the identity related to the previously sent OpenID URL.
For sake of clarity no exceptions are shown in the procedure. In the case something goes wrong, the UWM is
the one in charge of notifying the user application that the
handshake did not succeed.
Note that, by following the previous steps, UWM is the
first to see the other’s reputation and profile data. Furthermore, the RWM will be able to access to the UWM data only
if it is considered worth to receive it. This is the UWM-first
version of our architecture. The RWM-first version is easily
obtained by letting the UWM start sending its own OpenID
URL and modifying the next steps accordingly.
In the following, we describe the complete transaction
between two Web applications (user and remote applications) by following the UWM-first approach (the other case
can be easily derived). More in detail, as shown in figure 4:
1. the user makes a request to the application to execute
a transaction with a remote application;
provide, by extending the common data already passed during an OpenID authentication, the reputation data.
D) The remote application that provides the requested
resource after being authorized to do so from the RWM.
Before we start to discuss the fundamental phases that
occurs in a transaction we will describe the WM Handshake
procedure between WMs in which UWM and RWM proceed
to establish a mutual trust with the help of one or more IdPs.
During this phase the WMs exchange profile and reputation
data and verify that the user parameters are satisfied. More
in details, as shown in figure 3:
2. the user application contacts its UWM to obtain an authorization for the transaction;
3. the WM Handshake between the corresponding UWM,
RWM and IdPs occurs as described above;
4. if the handshake succeeds, the UWM sends the shared
RWM OpenID authorization token to the user application;
1. UWM requests the OpenID URL to the RWM and receives it;
5. the user application sends its original request together
with the authorization token to the remote application;
2. UWM starts the authentication procedure by contacting RWM’s IdP which authenticates RWM and replies
with the RWM’s reputation data;
6. the remote application uses the token to query its
RWM for the identification and profile of the requester
(as built with the UWM);
3. UWM recovers RWM’s profile data trough a GET request to the RWM using a standard URL;
7. the RWM returns the required resource; from now on
the transaction between the two applications does not
involve the underlying levels.
4. UWM checks the received profile and the reputation
data and, if all checks are passed, sends its OpenID
5. RWM starts the authentication procedure by contacting UWM’s IdP which authenticates UWM and replies
with UWM’s reputation data;
6. RWM recovers UWM’s profile data trough a GET request to the RWM using a standard URL;
In the case the WM Handshake does not succeed, the user
application, based on its configuration, may decide whether
to start or not a traditional transaction with the remote application. In fact one of the advantages of this approach is
that it does not alter the current Web model.
In our lab, we have built a basic prototype implementing
the procedures above in the context of OpenID and applied
it to the case of browsing a simple web application.
3. the WM Handshake occurs;
4. if the handshake succeeds, the UWM sends the shared
RWM OpenID authorization token to the browser;
5. the browser sends the ‘buy’ request together with the
authorization token to the auction web system;
6. the auction system uses the token to query its RWM to
receive the authorization for the incoming request;
7. the auction system shows the payment procedure to the
Figure 4. The general architecture.
4. The Online Auction Websites case study
In this section we will show how our architecture can be
easily instantiated to a real-life application.
4.1. The case
Alice is an Ebaia power seller with a positive feedback rate of 99%. Thanks to her excellent reputation, Alice
reaches big sales volumes. During the Web surfing, Alice
finds a new online auction system, called Xbid that offers
more convenient commissions on sales. Alice, interested by
the offer decides to test the new system but then she finds a
serious obstacle: there are no ways to migrate her excellent
reputation data (that builds up in a long time span) from the
current system to the new one. Discouraged, she decides
not to try Xbid.
The adoption of our model, thanks to the relocation of
the reputation data on an Identity Provider, allows the user
to access to more online auction systems, even at the same
time, increasing the seller presence on the market. Also, due
to the centralized reputation data, users can compare sellers on different auction platforms allowing a deeper level
of filtering. Last but not least, due to the buyers’ certified
identity, the seller is able to exclude malicious users that can
alter the auctions.
4.2. The implementation
The user application is the Web browser (the buyer’s one,
in this case) and the remote application is the auction system
Web server that will request to the seller RWM the autorization to proceed with the transaction. The seller RWM will
be identified by the UWM due to a metatag link present in
the product page as usually done with OpenID delegation.
The transaction steps are then so instantiated:
1. the user selects the ‘buy now’ option;
2. the browser contacts the user UWM, through a plugin,
to obtain an authorization for the transaction;
5. Conclusions
In this paper we have presented an architecture for improving some aspects related to Web navigation. The work
is still in progress and, due to the complexity of the different addressed issues, many aspects are still to be investigated: some scenarios have been outlined and the architectural model has been presented and tested in one of them.
As future work, we plan to test the architectural model in
many other scenarios and contexts.
[1] A. Ankolekar and D. Vrandečić. Kalpana - enabling clientside web personalization. In HT ’08: Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, pages
21–26, New York, NY, USA, 2008. ACM.
[2] J. Hoyt, J. Daugherty, and D. Recordon.
simple registration extension 1.0.
June 2006. 0.txt.
[3] A. J, R. Ismail, and C. Boyd. A survey of trust and reputation
systems for online service provision. Decis. Support Syst.,
43(2):618–644, 2007.
[4] J. Miller.
Yadis 1.0.
March 2006.
[5] Y. Oiwa, H. Takagi, H. Watanabe, and H. Suzuki. Pake-based
mutual http authentication for preventing phishing attacks. In
18th International World Wide Web Conference (WWW2009),
April 2009.
[6] D. Recordon and D. Reed. Openid 2.0: a platform for usercentric identity management. In DIM ’06: Proceedings of
the second ACM workshop on Digital identity management,
pages 11–16, New York, NY, USA, 2006. ACM Press.
[7] D. Reed and D. McAlpin. Extensible resource identifier syntax 2.0 (oasis xri committee specification). November 2005.
[8] A. C. Squicciarini, A. Trombetta, E. Bertino, and S. Braghin.
Identity-based long running negotiations. In DIM ’08: Proceedings of the 4th ACM workshop on Digital identity management, pages 97–106, New York, NY, USA, 2008. ACM.
The ENVISION Project: Towards a Visual Tool to Support Schema Evolution in
Distributed Databases
Giuseppe Polese and Mario Vacca
Dipartimento di Matematica e Informatica, Università di Salerno
Via Ponte don Melillo, 84084 Fisciano (SA), Italy
{gpolese, mvacca}
Changes to the schema of databases naturally and frequently occur during the life cycle of information systems;
supporting their management, in the context of distributed
databases, requires tools to perform changes easily and to
propagate them efficiently to the database instances. In
this paper we illustrate ENVISION, a project aiming to
develop a Visual Tool for Schema Evolution in Distributed
Databases to support the database administrator during the
schema evolution process. The first stage of this project concerned the design of an instance update language, allowing
to perform schema changes in a parallel way [14]; in this
paper we deal with further steps toward the complete realization of the project: the choice of a declarative schema
update language and the realization of the mechanism for
the automatic generation of instance update routines. The
architecture of the system, which has been implementing, is
also designed.
1. Introduction
Updating a schema is a very important activity which
naturally and frequently occurs during the life cycle of information systems, due to different causes, like, for example, the evolution of the external world, the change of user
requirements, the presence of errors in the system. Two of
the problems arising when a schema evolves are the semantic of changes (how to express the changes to the schema)
and the change propagation (how to propagate the schema
changes to the instances) [18]. These two tasks are performed using schema evolution languages and tools. Developing a tool for schema evolution in distributed databases
is an important and challenging task for the following reasons: first, the shortage of tools for schema evolution is
a well known problem [2, 6]; second, the rare existing
tools are limited1 ; changes in distributed database schemas
can provoke significant effects because updating instances
can involve the processing of an enormous mass of data
among distributed nodes, making the process of propagating changes to the instances a very expensive one. As a
consequence, database administrators (DBAs) have to cope
both with the difficulty of performing schema changes and
the efficiency of the change propagation process.
In order to develop a tool, it is necessary to design a
schema evolution language, which, according to Lagorce et
al., is composed of two languages, the instance update language and the schema update language, and a mechanism
allowing to translate schema update statements in instance
update ones [11].
The ENVISION (EfficieNt VIsual Schema evolutION
for distributed databases) Project aims to develop a Visual
Tool to support the DBA during the schema evolution process. The first stage of this project2 [14] concerned the design of an instance update language, based on the MapReduce Google programming paradigm [5,7], allowing to perform instance updates in a parallel way [14]. At this stage,
the project still suffered from the drawbacks of procedural
features of the language.
In this paper we illustrate the second stage of the project,
aiming to overcome these problems: we propose to adopt a
logical schema update language, both suitable for describing schema changes and straightforward translatable into
the instance update one. The result is, hence, the possibility
to perform changes to the schema in a declarative way and
to let the system generate the MapReduce instance update
routines, combining simplicity of use and efficiency.
The paper is organized as follows: after a short introduction to schema evolution and related problems in dis1 For example, the ESRI package ArcGIS (http://www.esri.
com/software/arcgis) includes tools for geodatabase schema
changes, but it supports only a small set of schema changes.
2 It was developed in collaboration with the Dip. di Costruzioni e
Metodi Matematici in Architettura of the Federico II University of Naples.
tributed databases (sections 2 and 3), in section 5 a declarative schema update language is proposed, together with the
algorithm for the automatic generation of the instance update routines (introducted in section 4). Section 6 gives an
account of the proposed architecture and the conclusions
end the paper.
2. Schema evolution: short state of the art and
Schema evolution takes place when a schema S evolves
towards a schema T (S and T are called schema versions).
Two important issues of schema evolution are the management of changes to the schema (a.k.a. semantics of schema
changes) and the propagation of changes to the data (a.k.a.
change propagation) [18]. The first one refers to the way
the changes are performed and their effects on the schema
itself, while the second deals with the effects of schema
changes on the data instances. These two tasks are realized
by schema evolution languages which are, in turn, composed of two languages, the schema update language and
the instance update language, and of a translation mechanism allowing to convert schema update statements into
instance update ones [11].
Figure 1 describes the schema evolution issues and the
role of the schema evolution language: S and T are the
two schema versions, I and J are the database instances,
mapST denotes a set of statements in the schema update
language and instance update routines are the statements by
which the database instances are updated accordingly. The
big arrow indicates the translation mechanism between the
two languages.
Figure 1. Schema evolution language
According to Lerner [13], there are two classes of
schema update languages, differing on the concept of
change: the command approaches which focus “on the editing process” ( [13], p. 86) and the ones which focus “on
the editing result” ( [13], p. 86). The approaches belonging to the first class3 define elementary change operations
(like deleting an attribute), by specifying their effects both
on the schema and on the data. Changes to the schema can
be simple (like adding an attribute) or compound [13], like
3 See
[1, 19, 20] for taxonomies of schema change operations.
merging two relations, which are very important in practical contexts. Two basic features of this kind of changes are
their procedural nature and their dependence from the data
model. The second kind of approaches are based on the idea
that an evolution is a correspondence of schemata (mapping). The first approach of this kind was due to Bertino [3];
the idea was also used in [12] and, later, by Lerner [13] from
other points of view. The use of schema mapping to represent schema changes has been increasing more and more,
also for the birth of Generic Model Management based approaches, which use schema mappings along with operators
to perform schema evolution [2]. Moreover, in a recent research project [6], the Schema Modification Operators have
been proposed, whose semantics is expressed by schema
mappings. An advantage of the mapping-based approaches
is their declarativeness, which makes schema changes easier to realize (e.g. using visual editors).
When a change is applied to a schema, it has to be propagate to the data, either by the DBA [9] or automatically
[1, 13]. There are different methods to realize the propagation of schema changes to the instances (see, for example, [13]); in this paper we are interested in the conversion
method, where a schema change invokes the update of all
the objects affected by the change itself. Two notable examples of instance update languages are those used in the
O2 system [20] and in the TESS system [13].
3. Features and problems of distributed
database schema evolution
Distributed databases are applied to a wide variety of domains: from classical administrative databases [18], to elearning repositories [10] or geographic databases (see, for
example, [14]).
There are many kinds of distributed architectures (see
[17] for a detailed account), all of them sharing the feature that data are fragmented across (geographically) distributed nodes. In this paper we are interested in all the
cases where a central node manages a schema and the data
of the database are spread across the local nodes, using fragmentation criteria [17]. The hypotheses of the presence of
a central node is not a limitation, as this situation is true
for a large number of architectures (for an example, see the
POOL architecture [10]).
The interest of schema evolution research in distributed
databases has growing in the latest years, as the inclusion
of this topic in the most updated survey on schema evolution shows [18]. Within the context of distributed databases,
the schema evolution issues of section 2 become more challenging: first, the change propagation process, involving potentially enormous mass of data distributed across nodes, is
very expensive, and it calls for efficient processing; second,
the translation mechanism is more difficult because the up-
dating routines are more complex. Therefore, the need for
a supporting tool, which allows the DBA to formulate the
schema changes easily and to propagate them to the data
automatically and efficiently, becomes more and more urgent.
nodes, which take data, to be passed to the user merge function, from two sources (the locations where reducers stored
them) using both a partition selector and an iterator.
4. A MapReduce-based instance update language for distributed databases
S = {Cities(city, prov, pop), P rovinces(prov, reg)}
MapReduce is a programming model [7] developed by
Google to support parallel computations over vast amounts
of data on large clusters of machines. The MapReduce
framework is based on the two user defined functions map
and reduce and its programming model is composed of
many small computations using these two functions. In general, the MapReduce execution process (see [7] for details)
considers special just one of the copies of the user program
calling map-reduce functions (called master), while the rest
are workers (there are M mappers and R reducers) the master assigns work to.
The MapReduce model has been extended for processing
heterogeneous datasets [5] and it is based on three user defined functions (map, reduce and merge) with the following
semantics (see [5] for details): a call to a map function processes a key/value pair (k1, v1) returning a list of intermediate key/value pairs [(k2, v2)]; a call to a reduce function
aggregates the list of values [v2] with key k2 returning a list
of values [v3] always with the same key; a call to a merge
function, using the keys k2 and k3, combines them into a
list of key/value [(k4, v5)]. Notice that a merge is executed
on the two intermediate outputs ((k2, [v3]) and (k3, [v4]))
produced by two map-reduce executions.
In [14], the Map-Reduce-Merge model has been exploited as an instance update language for geodatabases.
The proposed execution process, inherited from [5, 7], is
the following:
- Map task
When a map is encountered, the master assigns the map
tasks to the M workers (mappers). A map task consists in
reading data from the input locations, passing them to the
user map function and, then, storing them, sorted by the
output key, at some locations on some nodes.
- Reduce task
The master passes the locations where the mappers have
stored the intermediate data to the R reduce workers (reducers) which are assigned to some nodes. The reducers, using
an iterator, for each unique intermediate key, pass both the
key itself and the corresponding list of values to the user’s
reduce function. The result of the user reduce function is
stored on some nodes.
- Merge task
When the user program contains a merge call, the master launches the merge workers (mergers) on a cluster of
Example 1 Consider the schema S storing information
about cities
and the schema T obtained from S joining its relations on
the attribute prov:
T = {N ewCities(city, prov, pop, reg)}
The instance update related to this change can be realized
by the following sequence of map, reduce and merge routines:
use input Cities;
map(const Key& key,
const Value&, value){
prov = key;
city =;
pop = value.pop;
/* This map reads the Cities tuples from
the input locations and stores them, sorted by the output key prov, at some locations on some nodes.*/
reduce(const Key& key,
const Value& value){
Emit(key, value);
/*This reduce function, for each unique
intermediate key prov, builds the corresponding list of values.*/
use input Provinces;
map(const Key& key,
const Value&, value){
prov = key;
reg = value.reg;
/*Analogous to the previous map.*/
reduce(const Key& key,
const Value& value){
Emit(key, value);
/*Analogous to the previous reduce.*/
merge(const LeftKey& leftKey,
const LeftValue& leftValue,
const RightKey& rightKey,
const RightValue& rightValue) {
if (leftKey == rightKey){
/*The merge joins the result of the two
previous reduce functions on prov.*/
use output NewCities;
divide NewCities;
/*The table NewCities is fragmented.*/
5. The schema evolution language
The features of distributed database schema evolution of
section 3 lead to the following requirements for the schema
evolution language4:
- the language to express schema changes has to be
declarative, possibly visual;
- the mapping between schema versions has to have a
formal (logical) characterization;
- instance update (MapReduce-based) routines have to
be generated automatically;
- it has to be always possible to choose the level to operate with: visual (schema mapping) or instance.
The independence of use of the instance update language
from the visual schema update (the DBA has to be free to
choose any of them) is particularly important, as very complex schema changes could be required which are not supported, or not efficiently enough, by the tool.
5.1. The schema mapping language
An important problem to cope with when designing the
schema evolution language is the choice of the formal language for the mappings between schema versions. Mappings link two schemas S and T and are represented by “set
of formulas of some logical formalism over (S, T )” (Fagin et al. [8], p. 999) describing the relation between the
instances of the two schemas themselves (see Figure 1).
There are many logical schema mapping languages (see,
for example, [16] for a list), each of them suitable for some
purposes. Among them, the second order tuple-generating
dependency (SO tgd) language [8] has many desirable properties: it allows to express many schema changes (note that
SO tgd class includes that of GLAV mappings, which are
sufficient to link schemas for practical goals [15]); it has
been proved to be closed under composition [8]; its statements can be easily decomposed (see [8]); it allows the use
of functions. Moreover, we will show that its statements
can be also easily translated in MapReduce based instance
update routines.
A second order tuple-generating dependency (SO tgd)
(see [8] p. 1014 for details) is a formula of the form:
∃f (∀x1 (φ1 → ψ1 ) ∧ . . . ∧ ∀xn (φn → ψn ))
4 These
ones [6].
desiderata are a remake of Curino et al. D1.1, D1.4, D3.4, D3.7
where f is a set of functions, φi (resp. ψi ) (i = 1, . . . , n) is
a conjunction of atomic formulas of the form Sj (y1 , ..., yk )
(resp. Tj (y1 , ..., yk )), with Sj (resp. Tj ) k-ary relations of
S (resp. T ) and y1 , . . . , yk variables in xi (resp. terms on
xi and f ).
The language we propose to use is based on SO tgds,
but, since we use it in practical applications, we need instantiated SO tgds formulas (we call them ISO tgds): first,
the set of function f has to be instantiated (the DBA has to
write them, if necessary); second, in order to perform the
join (the right side φ of a SO tgd is a conjunction), the DBA
has to specify the merge attributes (this is made using equality constraints stating what attributes have to be considered
equals; if no constraint is specified, the attributes with equal
name and type are considered equal, and if none of such attributes exists, the join is interpreted as cross join).
Definition 1 Let S and T be two schemas. An ISO tgd mapping is a triple (Σ, E, F ), where Σ is a set of SO tgds, E
is a set of equality constraints on S, and F is a set of assignments of the kind y = f(x) (f is a function, x is a list
of attributes of relations in S and y is an attribute of some
relation in T ).
A simple example of function is the one assigning default values when a column is added to a table.
Example 2 Consider the schema evolution in the example
1. The ISO tgd mapping describing the passage between the
two schema versions S and T is:
Σ = {∀city, prov, pop, reg(
Cities(city, prov, pop) ∧ P rovinces(prov, reg) →
→ N ewCities(city, prov, pop, reg))}
E = {Cities.prov = P rovinces.prov} and F = ∅.
5.2. The translation mechanism
Even if the Map-Reduce-Merge language is procedural,
it is just its basic feature (i.e. being based on only three
functions) to suggest the possibility to generate instance update routines automatically. The idea under the automatic
generation is to use “basic” routines (we call them propagator chunks) which, properly combined, generate the desired
instance update ones.
Definition 2 (Propagator chunks) Let S be a relation and
let [y1 , . . . , yn ], k be, respectively, a list of attribute names
and an attribute name (a key); let [f1 (x1 ), . . . , fn (xn )] be
a list of function names fi , each with its argument name list
xi (i = 1, . . . , n):
- map-chunk(R, k, [y1, . . . , yn ]) is the routine:
use input R;
map(const Key& key,
const Value&, value){
k = key;
y1 = value.y1;...yn = value.yn;
The map-chunk reads the data from the input locations
of the table R and stores the values of the attributes
k,y1 , . . . , yn , sorted by the output key k, at some locations
on some nodes.
- reduce-chunk([y1 , . . . , yn ], [f1 , . . . , fn ]) is the routine:
reduce(const Key& key,
const Value& value){
y1 = f1(x1);...yn = fn(xn);
Emit(key, (y1,...,yn));
Moreover, if [y1 , . . . , yn ] is empty, the reduce-chunk ends
with Emit(key,value) instead of Emit(key, (y1 , . . . , yn )) and
if fi = nil (the no-operation function), there will be no assignment yi = fi (xi ).
The reduce-chunk, using an iterator, for each unique intermediate key k, pass the list of values to the user reduce functions f1 , . . . , fn ; the result of the user reduce functions is
stored on some nodes.
- merge-chunk(E) is the routine:
merge(const LeftKey& leftKey,
const LeftValue& leftValue,
const RightKey& rightKey,
const RightValue& rightValue)
if (E) {
The merge-chunk takes data from two sources (the locations
where reducers stored them) and merges them using the set
E of equality constraints.
- divide-chunk(R) is the routine:
use output R;
divide R;
{σ h a s t h e f o r m φ → T (z)}
ρ : = Λ ; {ρ i s s e t t o t h e e m p t y s t r i n g }
f o r each S(y) i n φ do b e g i n
update t h e key s e t s K1 and K2 u s i n g E ;
add map−chunk ( S , K2 , y ) t o ρ ;
add r e d u c e −chunk ( [ ] , [ ] ) t o ρ ;
i f S i s not t h e f i r s t r e l a t i o n in φ then
add merge−chunk ( EK1 ,K2 ) t o ρ ;
{EK1 ,K2 i s s e t o f c o n s t r a i n t s i n E
r e s t r i c t e d t o K1 and K2 }
i f S i s t h e l a s t r e l a t i o n in φ then
add r e d u c e −chunk ( z , F ) t o ρ ;
end ;
add d i v i d e −chunk ( T ) t o ρ ;
Ω = Ω ∪ {ρ} ;
end ;
end {IURG} .
It is easy to see that the computational complexity of the
algorithm is O(|Σ | · max|φ|), where max|φ| denotes the
maximum number of relation symbols in a right side formula φ.
Example 3 The IURG algorithm, applied to the ISO tgd in
the example 2, produces the instance update routine generated by the following list of propagator chunks:
6. The architecture
The architecture of the system, still under development,
showed in Figure 2, is constituted by the following modules:
• Visual Schema Manager (VSM)
The divide-chunk(R) fragments table R across the nodes.
This module is constituted of the visual interface (VI)
and of the VisualToMapping translator, which generates the SO tgds associated to visual changes. The
visual interface we are realizing is inspired from the
famous Clio project [15] and it allows to create mappings between schema versions using visual operators
like select, link, move, delete, add and modify. It also
allows to write functions on attributes to be associated
to other attributes, and to specify equality constraints.
The following IURG algorithm is an instance update routine generator, using the propagator chunks.
Algorithm IURG ( Σ , Ω ) ;
INPUT an ISO t g d mapping (Σ, E, F ) ;
OUTPUT t h e s e t Ω o f Map−Reduce−Merge
routines ρ;
Σ := ∅ ;
f o r each σ ≡ φ → n
i=1 Ti ∈ Σ b e g i n
add σi ≡ φ → Ti (z) (i = 1, . . . , n) t o Σ ;
end ;
K1 : = ∅ ; K2 : = ∅ ;
f o r each σ i n Σ do b e g i n
• Instance Update Routine Generator (IURG)
This module, based on the IURG algorithm presented
in section 5.2, takes a SO tgd as input and returns the
Map-Reduce-Merge instance update routines.
• Network Manager (NM)
This module coordinates the execution process described in section 4. It also provides an interface to
write Map-Reduce-Merge routines.
The system uses the Java platform and hadoop.
Figure 2. The ENVISION system architecture
7. Conclusions and future work
A schema update language, together with an algorithm
to translate its statements into Map-Reduce-Merge instance
update ones, has been presented. This language allows to
design a visual interface and, hence, to lay the foundations
for building a complete tool to support schema evolution
in distributed databases, whose architecture has also been
presented. The next step we have planned is to enrich our
model with a simulation function (extending the NM module functions) to check the change effects before performing
them: on the one hand, this provides the DBA with a further tool to manage changes, and, on the other hand, such a
function is a very important tool for us in order to study the
efficiency of the system, that is to fulfill our goal of making
the schema evolution process as much efficient as possible
in distributed databases.
[1] J. Banerjee, W. Kim, H.-J. Kim, and H. F. Korth. Semantics
and implementation of schema evolution in object-oriented
databases. In U. Dayal and I. L. Traiger, editors, SIGMOD
Conference, pages 311–322. ACM Press, 1987.
[2] P. A. Bernstein and S. Melnik. Model management 2.0: manipulating richer mappings. In Chan et al. [4], pages 1–12.
[3] E. Bertino.
A view mechanism for object-oriented
databases. In A. Pirotte, C. Delobel, and G. Gottlob, editors,
EDBT, volume 580 of Lecture Notes in Computer Science,
pages 136–151. Springer, 1992.
[4] C. Y. Chan, B. C. Ooi, and A. Zhou, editors. Proceedings
of the ACM SIGMOD International Conference on Management of Data. ACM, 2007.
[5] H. chih Yang, A. Dasdan, R.-L. Hsiao, and D. S. P. Jr. Mapreduce-merge: simplified relational data processing on large
clusters. In Chan et al. [4], pages 1029–1040.
[6] C. Curino, H. J. Moon, and C. Zaniolo. Graceful database
schema evolution: the prism workbench. PVLDB, 1(1):761–
772, 2008.
[7] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137–150, 2004.
[8] R. Fagin, P. G. Kolaitis, L. Popa, and W. C. Tan. Composing
schema mappings: Second-order dependencies to the rescue. ACM Trans. Database Syst., 30(4):994–1055, 2005.
[9] F. Ferrandina, T. Meyer, R. Zicari, G. Ferran, and J. Madec.
Schema and database evolution in the o2 object database
system. In U. Dayal, P. M. D. Gray, and S. Nishio, editors,
VLDB, pages 170–181. Morgan Kaufmann, 1995.
[10] M. Hatala and G. Richards. Global vs. community metadata standards: Empowering users for knowledge exchange.
In I. Horrocks and J. A. Hendler, editors, International Semantic Web Conference, volume 2342 of Lecture Notes in
Computer Science, pages 292–306. Springer, 2002.
[11] J.-B. Lagorce, A. Stockus, and E. Waller. Object-oriented
database evolution. In F. N. Afrati and P. G. Kolaitis, editors,
ICDT, volume 1186 of Lecture Notes in Computer Science,
pages 379–393. Springer, 1997.
[12] L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. On
the logical foundations of schema integration and evolution
in heterogeneous database systems. In DOOD, pages 81–
100, 1993.
[13] B. S. Lerner. A model for compound type changes encountered in schema evolution. ACM Trans. Database Syst.,
25(1):83–127, 2000.
[14] F. D. Martino, G. Polese, S. Sessa, and M. Vacca. A
mapreduce framework for change propagation in geographic
databases. In ICEIS, 2009.
[15] R. J. Miller, M. A. Hernández, L. M. Haas, L. Yan, C. T. H.
Ho, R. Fagin, and L. Popa. The clio project: managing heterogeneity. SIGMOD Rec., 30(1):78–83, 2001.
[16] A. Nash, P. A. Bernstein, and S. Melnik. Composition of
mappings given by embedded dependencies. ACM Trans.
Database Syst., 32(1):4, 2007.
[17] M. T. Özsu. Distributed database systems. In Encyclopedia
of information systems, pages 673–682, 2003.
[18] S. Ram and G. Shankaranarayanan. Research issues in
database schema evolution: the road not taken. Univ. of Arizona, Working Paper #2003-15, 2003.
[19] J. F. Roddick, N. G. Craske, and T. J. Richards. A taxonomy
for schema versioning based on the relational and entity relationship models. In ER, pages 137–148, 1993.
[20] R. Zicari. A framework for schema updates in an objectoriented database system. In ICDE, pages 2–13. IEEE Computer Society, 1991.
Towards Synchronization of a Distributed Orchestra
Angela Guercio
Department of Computer Science,
Kent State University Stark
North Canton, OH 44720, USA
e-mail: [email protected]
Timothy Arndt
Dept. of Computer and Information Science
Cleveland State University
Cleveland, OH 44115, USA
e-mail: [email protected]
In an Internet-based multimedia application that plays an
orchestra of remote source sounds, the synchronization of
audio media streams is essential for optimal performance of
the piece. The application that enables this virtual
synchronized orchestra benefits from the use of a language
containing constructs that help express the specifications
and requirements of such a reactive system. We provide a
model for the performance of a distributed orchestra. The
architecture of the conducting system takes advantage of
the synchronization abilities of TANDEM, a multimedia
language for reactive multimedia systems that has been
extended with constructs to describe the conductor’s
gestures and the syntax and semantics of those constructs.
The PCM live streams and at least one MIDI stream per
section are multiplexed at each remote source and time
stamped before transmission. At the receiver the TANDEM
environment performs synchronization with the trigger and
the active repository.
Index Terms – Computer Languages, Multimedia
Systems, Real Time Systems, Synchronization, Reactive
Music has been widely used to entertain, relax (in doctors
offices, elevators and commercial centers), and nourish the
artistic spirit. The ability to download on a PDA or MP3
player our favorite orchestra piece is a reality. The
popularity of tools like YouTube, iPhone and iPods are
examples and industry and research have devoted much
attention to multimedia tools which increase our ability to
interact with media to communicate. All these multimedia
tools require a type of synchronization in order to produce
the desired outcome. We focus our attention on the problem
of an Internet-based multimedia application that plays an
orchestra composed of distributed sounds.
The possibilities when distributed remote audio streams
are synchronized together are endless. To mention a few:
a) the creation of a Virtual Orchestra with sound tracks
coming from remote sources would be an invaluable
tool for a musician who wants to experience the
execution of a piece with his/her favorite musician;
b) the ability to create extemporaneous virtual sonority
that can be added to other media types, to recreate the
sound of a specific environment in a museum (e.g., the
sounds of the savannah in Africa at dawn);
c) the live performance with musicians playing in
different parts of the world.
d) in classrooms, to add to the local students’ musical
performance, a remote soloist.
The strong synchronization required by distributed
musical applications can be beneficial in applications in the
domains of distance education, large-scale military training,
homeland security, business or social meetings.
The result of the orchestra performance must be a
realistic reproduction of the composer’s beat, tempo, and
expression symbols performed in a synchronized way that
avoids possible stuttering effects or unsynchronized
performance. The satisfaction of all these requirements is
challenging, and has led to the development of specialpurpose languages for multimedia authoring and
presentations. In particular for computer techniques applied
to music and musicology which deal with audio and/or
graphical representation or score of music with
performance and sometimes with choreography, several
recommendations or standards have been introduced.
Examples of such languages include the latest IEEE 1599
standard [15], SMIL [19] and all the existing markup music
initiatives [21] such as SMDL, MusiXML, MusicXML,
MDL, FlowML, Hy Time, etc.. While some of the above
languages only describe musical notation, others can
describe a multimedia presentation containing multiple
media sources, both natural and synthetic, as well as stored
or streamed media. In SMIL and IEEE1599 some
mechanisms for specifying layout of the media on the
screen is given as well as primitives for synchronizing the
various elements of the presentation and a small set of basic
events are supported while more complex events require the
use of scripting languages such as JavaScript.
While these languages are well suited for the description
of music and multimedia presentations on the Web, they are
of limited use for creating more general distributed
programming is only available through scripting languages
that have limited power. To support the construction of
more large-scale applications approaches such as the use of
special multimedia libraries along with a language as in the
case of Java and JMF [10] or extension of middleware such
as CORBA [17] are available. Besides lacking essential
characteristics for development of advanced distributed
multimedia applications that will be noted below, the use of
libraries and/or middleware to achieve synchronization and
perform other media related services results in a less wellspecified approach than can be achieved by directly
extending existing general purpose languages with
multimedia constructs with precisely specified semantics.
Following this approach, in [5] a language, called
TANDEM (Transmitting Asynchronous Non-deterministic
and Deterministic Events in Multimeddia systems) and its
architectural model [6, 7, 8] that suppoorts general-purpose
computation has been analyzed and desiigned. The language
constructs can be added to an existinng general purpose
language such as C, C++ or Java. This approach is similar
to the approach taken by the reactive lannguage Esterel [2, 1]
which adds reactivity to general purppose languages. We
extend the language by introducingg the syntax and
semantics of new constructs for synnchronization of a
distributed orchestra of audio mediaa. These constructs
express the temporality of the piece aas derived from the
conductor’s gesture. The semantics oof these constructs
expresses the temporal issues requiredd and enforces the
generation of appropriate events. The TANDEM
architectural model is able to deal witth audio streams so
that they can be played in temporal correelation, to guarantee
the synchronization after possible transformations (e.g.
transpositions, distortions, etc.) and to hhandle possible data
loss during transmission over the channeel.
c) the circular gesture of the conducting hand lets the
conductor bring a section or the whole orchestra to a
stop. This gesture is interpreteed as an interrupt.
When a gesture starts, the speeed, direction and amplitude
are identified.
The speed of the gesture: The speed is assumed to be
maintained constant for the duratiion of a gesture. The speed
of the gesture is used to help the
t synchronization of the
multimedia streams with the beat.
The direction of the gesture: Thee direction of the gesture is
used to identify the gesture and when
the change of gesture
occurs. The direction may also
o identify the change of
volume, as in the Vertical gestu
ure, or an attack, as in the
Horizontal-Toward gesture.
he amplitude of the gesture
The amplitude of the gesture: Th
is important in the vertical gessture since the larger the
amplitude of the gesture, the higher
the volume of the
orchestra. The change in volum
me is never abrupt and is
modeled by a progressive variatio
The conductor of a live orchestra is a simple time-keeper
as well as an interpreter and communnicator of emotional
content of the music being played. Classsical studies on the
conductor’s gesture can be found in maanuals such as [16].
While different conductors direct the orchestra according to
their personality and expressivity, this sshould not affect the
pure synchronization aspect of the final execution but rather
increases the beauty of the performance for the listener.
According to Luck, [13, 14], who has performed an
empirical investigation, no effects oof the conductor’s
previous experience or the radius of cuurvature with which
the beat was defined alter the conductor-musician
synchronization. Only the experience oof the participants in
the experiments was significant aand affected their
synchronization ability. On the basis oof these results, we
assume for simplicity a set of conductiing gestures such as
[18]. The gestures are independent of thhe experience of the
conductor. Each gesture of a conductorr is represented by a
vector. A gesture has a speed measured bby a quantum.
We assume that the conductor perforrms with two hands,
one hand maintains the beat (we assumee the right), the other
controls volume, attack etc. In particularr, for the left hand,
a) the vertical gesture of the conductinng hand down or up
controls the volume, which increasess when the direction
goes up and decreases when it goes down. This gesture
minuendo during the
is interpreted as a crescendo or dim
piece execution.
b) the horizontal-toward gesture (hoorizontal towards a
section of instruments moving the hhand in a downward
movement) of the conducting handd lets the conductor
start the section’s audio stream
m. This gesture is
interpreted as an attack. The conseqquence of starting a
section by pointing at it starts thhe buffering of the
section’s audio stream while thee playback of the
section’s stream will start aaccording to the
synchronization of the beat.
Fig. 2.1 Orchestra Cond
ductor Movements
In the case of a virtual orchestrra with independent remote
and local sources, multimediia data integration and
must take
nto consideration the
synchronization of the media strreams that are streamed in
real-time and, possibly compenssate for jitter or any other
possible alterations caused by thee orchestra participants.
In this section we will give an overview of the
performance environment. The principal actors are the
d of a group of performers.
orchestra sections, each composed
Each section may contain eith
her live and/or recorded
musicians, local or remote. The other
principal actor is the
conductor who may be either a live or virtual conductor.
same location as the
Anything that is local is in the
conductor. If TANDEM is exteended with commands to
describe the conductor’s gestures, a virtual conductor can
d avatar representing the
be used to create an animated
conductor and reproducing the co
orrect gestures. The system
will respond to the gestures and
d will produce reactions to
various situations using triggers and
a the active repository.
In order for the conductor and live musicians to interact,
the actors must each be able to see each other. The
y viewing a video stream
musicians see the conductor by
(either of the live or animated
d virtual conductor). The
conductor needs to have a view of the entire orchestra to
direct his gestures at particulaar sections. This is done
through a virtual stage (see fig. 3.1) - a screen with multiple
windows each containing one or more sections. The
ons is defined using spatial
position of these windows/sectio
relations as in SMIL. When thee performance begins, the
conductor sees a screen containin
ng the virtual stage. He can
then direct his gesture to thee relevant sections. Each
window on the virtual stage can be filled either with a live
or recorded video of the section, or with a static image or
animated representation of the section.
The live conductor’s gestures are captured using gesture
recognition techniques, possibly incorporating sensors [20]
or computer vision technology [3]. The gestures are then
transformed into conductor actions (see the following
section) using motion tracking and classification
algorithms. The conductor’s actions can be used to drive an
animation of the conductor for remote live performers if a
video stream of the conductor is not available.
French Horns
environment then supports run-time synchronization. The
implementation will be similar to Esterel [1].
Fig. 3.2 The TANDEM Model
2nd Violins
1st Violins
Fig. 3.1 The Virtual Stage
Live local performers respond immediately to the
conductor’s gestures, while remote performers response is
somewhat delayed due to differing amounts of network
latency. This difficulty is overcome by the Active
Repository [9] which acts as a buffer for remote and
recorded performance data. Immediate response to the
conductor’s gestures is achieved via synchronization
constructs of TANDEM on the data in the Active
Repository (see fig.3.2).
Just as live performance on MIDI devices can be
captured for later playback, so can the conductor’s gestures
be captured (in the Active Repository) and later “played
back”, that is, used to drive a virtual conductor animation or
avatar to control an orchestra. Also analogously to MIDI, it
is possible to program the conductor’s performance
(without going through the actual conducting gestures –
analogous to composing MIDI scores without actual
performance) and use the program to drive the virtual
conductor. Of course it is usually necessary for the
conductor to respond to the performers during the
performance. This can be supported by the system by
defining a number of triggers based on performance
conditions which will be triggered by meeting the
conditions and cause particular gestures to be
The structural model of the system is depicted in fig. 3.3.
A. Construct Definition
The syntax of the constructs must express the
synchronization between the gestures of the conductor and
the multimedia streams that represent the instruments of the
orchestra. The conductor uses a virtual stage interface
where the sections have been spatially arranged on the
screen before the beginning of the performance. On the
virtual stage a section represents a number of musicians
grouped by instrument type, the i.e. 1st violins, the 2nd
violins, the flutes, and so on. A section consists of a set of
one or more media streams or it could be under local
control. Multiple musicians playing together remotely will
be captured by a single camera and a single media stream
will be transmitted over the network. Multiple musicians
remotely located in multiple geographic locations that are
part of the same section produce a number of streams equal
to the number of remote locations. Multiple sections of the
virtual orchestra may also be multiplexed together in a
single stream if they are at the same remote location.
Multiple musicians of the same section that play locally are
not associated with any stream.
Virtual Stage
Gesture Recognition and
TANDEM Conductor Actions
TANDEM Sync Actions
Tandem Active
In this section we will describe the language constructs
that support synchronization for a synchronized orchestra.
The exact syntax of the constructs will depend on the host
language the multimedia constructs are embedded in. In the
examples that are given, the host language is C. This results
in the constructs having a “C-like” syntax. It is expected
that the processing of the synchronization constructs will be
handled by a preprocessor before passing the results to a
compiler for the given host language. A run-time
Local Section
Remote Section
Fig. 3.3 The Structural Model of the Conductor System
Conductor gestures may be directed at the orchestra as a
whole or at individual sections. Gestures directed at
individual sections may be classified as either immediate
local; immediate remote; or delayed remote. The gesture is
immediate local if it directed at a local section. The gesture
is immediate remote if it is directed at a remote section that
produces one multiplexed stream. In this case, the remote
players of the section will require a certain, small amount of
time (depending on the roundtrip network latency) to
respond to the gesture, but the Active Repository can mask
this latency by modifying the buffered stream (increasing
playback rate, decreasing volume, etc.). If the gesture is
directed at a part of the remote section (this would occur if
a remote section contains more than one instrument type)
since the remote section produces one multiplexed stream,
the latency in responding to the gesture by part of the
remote section cannot be masked by the Active Repository,
since this would involve modifying (e.g. speeding up)
multiple sections of the multiplexed stream, not just the
single one to which the gesture is addressed. We assume
that the gestures intended for the virtual orchestra as a
whole are either immediate local or immediate remote
(there exists one or more remote sections and each remote
section generates a multiplexed stream).
Given the previously defined virtual stage, we define the
virtual orchestra as a group of pairs section/region.
group my_orchestra = (section1, region1,
section2, region2 …)
Each instrumental section then must be defined as
associated with one or more streams or as a local section.
For example:
section wind =(windstream1, windstream2)
section chorus =(local)
The streams are defined in TANDEM in terms of their
various attributes. The actions of the conductor are
connected with the gestures recognized by the gesture
analyzer that the conductor performs to guide the sections
of the orchestra. The enumerated list of available actions is:
enum actions {beat, attack, interrupt,
cutoff_section, cutoff, crescendo,
The time signature is indicated by the “beat”. The beat is
given by the gesture of the right hand of the conductor. The
beat is identified by the change of direction of the end of
the baton. The gesture analyzer produces the command
beat(time, position)
The speed of the baton can be derived from the times and
positions of a sequence of two beats. We assume that the
conductor, as well as the musicians, are aware of the time
signature of the piece being performed. The rigid value of
the metronome can be slightly stretched by the personality
of the conductor which can be detected through the change
of speed between beats. At each beat the synchronization of
the media streams is enforced.
The attack gesture indicates that a section or the orchestra
as a whole should start to play. The gesture is a horizontaltoward gesture (pointing) with the left hand directed at the
section or orchestra. Both the time of the attack and section
indicated or orchestra as a whole are retrieved and passed as
parameters to the command by the gesture analyzer. The
command is described as:
attack(time, section)
attack(time, orchestra)
The time of the attack is synchronized with the time of
the beat relative to the stream indicated by the section
parameter. If the attack is directed to the whole orchestra,
all the streams will be synchronized as a group.
We assume that when two sections are addressed to start
at the same time two sequential movements indicating
attack are detected in very close time sequence. Such time
difference is smaller than an ε (the ε must be smaller than a
beat time) the two attacks are interpreted as one, the two
sections are processed as being in one group and the
multiplexed streams relative to the two involved sections
are synchronized with respect to the first common synch
point detected among the group participants. In a fine
synchronization, the distance between synch points must be
imperceptible to the human ear.
The crescendo (resp. diminuendo) gesture which is
indicated by an upward (resp. downward) vertical
movement of the open left palm, increases (resp. decreases
the volume of a section. The command produced by the
gesture analyzer contains the time at which the gesture
occurs as well as the section to which it applies or the
orchestra as a whole and is described as:
crescendo(time, section)
crescendo(time, orchestra)
(resp., diminuendo(time, section)
diminuendo(time, orchestra))
The semantics of the command enforces the volume
alteration accordingly in a synchronized way.
The circular gesture of the left hand is used to interrupt a
section or the whole orchestra. The command produced by
the gesture analyzer contains the section to which it applies
and is described as:
interrupt(time, section)
interrupt(time, orchestra)
The command causes an abort of the section or whole
The conductor system is a distributed multimedia reactive
system modeled as a communicating concurrent state
machine in which multiple triggers are concurrently active
at different remote sites (see fig. 3.2). We distinguish two
types of states: a computational state and a multimedia
state. A computational state is a set (identifier, attribute,
value), where an entity could be a stream, variable,
constant, spatial constraint, temporal constraint, mobile
processes caused by migration of code over the Internet, or
a channel between two computational units. A multimedia
state M is a set of multimedia entities such as streams,
asynchronous signals (denoted by η), partial conditions, or
attributes of media objects such as streams or asynchronous
signals. A transition between multimedia states occurs if
media entities are transformed. A change of multimedia
state also generates changes in computational states.
Transformation of a multimedia state involves passage
through many computational states with no multimedia
state change.
In a real time conductor system, two concepts are very
important: continuity, which contains the notion of
temporality, and context which expresses spatio-temporal
relationship between objects. Breakage of either of them
causes lack of perception and comprehension. In
multimedia reactive systems, continuity is guaranteed by
the physical presence of the multimedia streams and their
temporal relationship to each other which is guaranteed by
the presence of multiple clocks and the presence of synch
points at regular temporal intervals. The temporal logic of
the system and the state behavioral semantics provides the
behavioral rules for the language by describing the states
and transitions between states during computation.
We use state logical behavior to describe the semantics of
the constructs introduced. The constructs attack and
interrupt produce events that generate trigger operations.
Let α be the action taken that transforms the multimedia
state μ into μ’, then a state transformation caused by an
action α given the set of entities Ψ is written as
We define a streaming code number k, where k ≥ 1. The
streaming code number encodes the reaction to an
asynchronous signal, such as attack or interrupt, performed
on the streams samples between two synch points of a
stream. When k=1 the action of abortion is strong; for k>1
the action abortion is weak. We will denote the state after
applying the actions in a single iterative cycle as μI. Under
the assumption that the smallest data unit is an audio
sample or a video frame, the sync point for an audio/generic
media stream corresponds to m (m • 1) data units. Then the
state transition for traversing one sync point is (αI)m. An
asynchronous signal η that initiates a preemptive action,
such as an interrupt, has to wait 1 synch point to reach the
new state μ’. However, if the abortion is strong (this is the
most general case in the conductor system) the streaming
will involve the whole orchestra and is interrupted at the
first synch point (k = 1) of the stream and the control moves
out of the beat loop. If the interrupt is weak (useful for
more general use in multimedia systems) the streaming is
completed after the current clip/audio stream is over (k •
During abortion the current state is saved. However, the
multimedia state is defined as the disjoint union of the
frozen state and the new state derived from the alternate
thread of activity so that the frozen state can be restored
after the next attack action. At the first attack of the
performance there are no frozen states and μsusp ⊕ μ’ = μ’.
Table 1 describes semantic rules for interrupt, and attack.
The constructs crescendo (resp. diminuendo) perform
transformation actions which are executed in the
transformer. The construct increase or decrease the volume.
A stream s is a pair of the form (sA, sD) where sD is a
sequence of elements containing the data and sA is the set of
attributes associated with the stream s. We use σ(sD, i) to
denote the ith frame/sample (data_element) in the stream.
Accessing a frame/sample f in a stream s, is performed by
the access operator that is defined as π1(σ(π2(s), i)) if 0 < i
” ||s|| otherwise ⊥ (read undefined) where π1 accesses the
attribute elements of the stream, and π2 accesses the data
elements of the stream. Therefore, the crescendo construct
is expressed as crescendo(s) = π1(σ(sD, i)) .
The system architecture is depicted in fig. 5.1. Each
remote source has several musical instruments and one or
more MIDI instruments. The PCM audio of the instruments
is mixed onsite. The sampled PCM data are multiplexed
with MIDI data and stored in time stamped packets. Each
packet (see fig 5.2) contains a sequence of PCM samples,
followed by a sequence of MIDI events occurring in the
time interval, plus a time stamp. The number of samples
collected in each packet and the sampling rate, give the
granularity of future synchronization.
Time Stamp
Number of MIDI
MIDI Events
Fig. 5.2 Multiplexed packet
Instrum. 1
Instrum. 2
PCM Audio
with delay
Instrum n
PCM Audio
with delay
PCM Audio
with delay
MIDI Instrum.
Remote Section 1
Remote Section 2
(αI)m, k • 1
(μI)m *
Remote Section n
μ ∧¬η
interrupt; (αI)m, k • 1
⊕ μ’
(μI)m * ∧η
Fig. 5.1 The Distributed System Experimental Prototype
μ ⊕ μ’
suspend; α
The MIDI stream generated is extended with one special
additional event, called the attack event, which is inserted
in the MIDI event stream at the beginning of the
performance just before the first note. The presence of this
event will explicitly determine the start of the performance.
The TANDEM language synchronizes multiple streams
based on the synch points identified by the packets. For a
reliable performance the streams are buffered at arrival.
Due to varying tempos both within sections of the orchestra
Table 1. Semantics of the constructs in the trigger
and between sections, there is no guarantee that a beat will
correspond exactly with a synch point, the synchronization
is actually performed at the nearest synch point to the beat,
or to the time indicated by a particular gesture. In order to
meet synchronization needs in the orchestral domain, the
synch points will be chosen so that any variation from the
beat or action time is below the perceptual level.
The signals which make up the streams contain data
which includes both audio PCM data and data related to the
score - either MIDI-type messages or simple beat-based
information. The data also contains implicit time stamps
related to synch points. For live streamed data from
multiple remote sites, an atomic clock or similar
mechanism may be used to provide a precise enough timestamp that the combined performance is close enough to
perfect synchronization to be under the perceptual level.
It is sometimes impossible to deliver remote performance
data in time to avoid perceptual distortion. This may be
caused by transient high network latencies. In this case, the
Active Repository causes the delayed stream to be muted,
rather than allowing the distortion to affect the performance
of the orchestra as a whole. Once the stream has caught
back up, it will be restarted. This will result in some of the
data for the late arriving stream being skipped.
In this paper we provided a model for the performance of
a distributed orchestra. The architecture of the conducting
system takes advantage of the synchronization abilities of
the TANDEM environment via triggers and the Active
Repository, providing an effective way to synchronize live
media streams. For this purpose TANDEM has been
extended with constructs to describe the conductor’s
gestures and the semantics of those constructs has been
provided. The PCM live streams with at least one MIDI
stream per section are multiplexed at each remote source
and time stamped before the transmission. The inclusion of
MIDI data allows for recognition of beats in the stream for
There have been some related efforts in distributed
musical performance, however most existing systems that
are not sequencers (i.e. software or hardware to create and
manage computer generated music) use prerecorded MIDI
instruments or MIDI files only. For example, the virtual
conducting system described in [4] uses prerecorded MIDI
files played locally. More interesting is the approach
presented in [23] where an architecture for the management
of a distributed musical performance is given. The system
does not use a conductor, the stream management again
uses only MIDI sequences. In [12] and [22] one-way
streaming of musical rehearsal using real time PCM audio
was used but all players, including a human conductor,
were at a sender site with performance at the receiver.
In Gu [11] PCM audio was streamed over the network
in real time in compressed format. To perform compression
at a realistic time only prerecorded audio was streamed
instead of live performance. The focus of the work was
related to compression scheme, packet loss and quality of
the streamed audio.
[1] G. Berry, G. Gonthier, “The ESTEREL Synchronous
Implementation”, Sci. of Comp. Progr. 19, n. 2, pp.87-152,
Nov. 1992.
[2] G. Berry, “The Foundations of Esterel”, in Proof, Language
and Interaction: Essays in Honour of Robin Milner, G.
Plotkin, et al. ed., MIT Press, pp.425-454, June 2000.
[3] N. D. Binh, E. Shuichi and T. Ejima, "Real-Time Hand
Tracking and Gesture Recognition System", ICGST Int. J. on
Graphics, Vision and Image Processing, 7, pp.39-45, 2007.
[4] J. Borchers, E. Lee, W. Samminger, M. Mühläuser, "Personal
Orchestra: A Real-Time Audio/Video System For Interactive
Conducting", Mult. Syst., 9, pp.458-465, Springer, 2004.
[5] A. Guercio, A. Bansal, T. Arndt, “Languages Constructs and
Synchronization in Reactive Multimedia Systems”, ISAST
Trans. on Comp. and Soft. Eng., 1, n.1, pp.52-58, 2007.
[6] A. Guercio, A. K. Bansal, “Towards a Formal Semantics for
Distributed Multimedia Computing”, Proc. of DMS 2007,
San Francisco Sept. 6-8, pp.81-86, 2007.
[7] A. Guercio, A.K. Bansal, T. Arndt, “Synchronization for
Multimedia Languages in Distributed Systems”, Proc. of
DMS 2005, Banff, Canada, Sept. 5-7, pp.34-39, 2005.
[8] A. Guercio, A. K. Bansal, “TANDEM – Transmitting
Asynchronous Nondeterministic and Deterministic Events in
Multimedia Systems over the Internet", Proc. of DMS 2004,
San Francisco, pp. 57-62, Sept. 2004.
[9] A. Guercio, A. K. Bansal, "A Model for Integrating
Deterministic and Asynchronous Events in Reactive
Multimedia Internet Based Languages", Proc. of the 5th Int.
Conf. on Internet Computing (IC 2004), Las Vegas, June 2124, pp.46-52, 2004.
[10] R. Gordon, S. Talley, Essential JMF – Java Media
Framework, Prentice Hall, 1999.
[11] X.Gu, M. Dick, Z.Kurtisi, U. Noyer, L. Wolf, “Networkcentric music performance: Practice and Experiments”, IEEE
Comm. Mag., 43, n.6, pp.86-93, 2005.
[12] D. Konstantas, "Overview of telepresence environment for
sitributed musical rehersal", Proc. of ACM Symposium on
Applied Computing (SAC'09), Atlanta, 1998.
[13] G. Luck, J.A. Sloboda, "An investigation of Musicians'
Synchronization with Traditional Conducting Beat Patterns",
Music Perform. Res.,1,1, pp.6-46, IISN-7155-9219, 2007.
[14] G. Luck, S. Nte, "An Investigation Of Conductors' Temporal
Gestures And Conductor-Musician Synchronization, And A
First Experiment", Psychol. of Music, 36(1), pp.81-99 2008.
[15] L.A. Ludovico, “Key Concepts of the IEEE 1599 Standard”,
Proc. of the IEEE CS Conf. The Use of Symbols To Represent
Music And Multimedia Objects, pp.15-26, Lugano, CH, 2008.
[16] B. McElheran, “Conducting Technique for Beginners and
Professionals”, Oxford University Press, USA, 1989.
[17] Object Manag. Group, “Control and management of A/V
streams specification”, OMG Doc. telecom/97-05-07, 1997.
[18] M. Rudolf, “The Grammar Of Conducting”, Wadsworth,
London, 1995.
[19] SMIL2.0 Specification, 2001.
[20] G. Stetten, et al., "Fingersight: Fingertip visual haptic sensing
and control", Proc. of IEEE Int. Workshop on Haptic Audio
Visual Env. and their Appl., pp.80-83, 2007.
[21] XML and Music”,
[22] A. Xu, et al., “Real-time Streaming of Multichannel Audio
Data Over the Internet” , J. Audio Eng. Soc, 48, pp.7-8, 2000.
[23] R. Zimmerman, E. Chew, S Arslan Ay, M Pawar,
“Distributive Musical Performances: Architecture and Stream
Management”, ACM Trans. on Mult. Comp., Comm. and
Appl., 4, n. 2, Article 14, May 2008.
Semantic Composition of Web Services
Manuel Bernal Llinares, Antonio Ruiz Martínez, Mª Antonia Martínez Carreras, Antonio F. Gómez Skarmeta
Department of Information and Communication Engineering
Faculty of Computer Science
University of Murcia
Murcia, Spain
{manuelbl, arm, amart, skarmeta}
Abstract—Nowadays the number of applications and processes
based on Web Services is growing really fast. More complex
processes can be achieved easily through the composition of Web
Services. There are proposals like WS-BPEL to compose Web
Services but nowadays this process is done statically. There is a
strong coupling between the Web Services that are involved in
the composition and the composition process itself, thus, changes
on the services will invalidate the composition process. To resolve
this problem we have defined an architecture where the
composition processes are abstract and semantic information is
used for linking them to the right Web Services for every
Collaborative Environments,
composition, semantic.
Service-Oriented Architecture (SOA) is the platform for
under the Web services technology which has demonstrated to
fit with ithaving all the required components defined in SOA: a
way to describe services, including the basic information
defined in SOA and some more: Web Service Definition
Language (WSDL)[3]; a mechanism to represent the necessary
messages: SOAP[36]; a service to be able to know the
existence of services, a mechanism to search for a services:
Universal Description, Discovery and Integration (UDDI)[4].
But the related standards of Web services go far away from
the basis of SOA. We also can find: Web Services
Interoperability (WS-I)[4]; Web Services Business Process
Execution Language (WS-BPEL), an orchestration language
using Web services; Web Services Choreography Definition
Language (WS-CDL)[5], a choreography language for Web
Services; Web Services Choreography Interface (WSCI)[7], a
language for describing interfaces used to specify the flow of
messages at interacting Web Services.
Web services technology has become the favorite platform
over which companies and institutions implement all their
services, this heterogeneity of Web services providers and
consumers has motivated an increased interest for the
composition of services in the research community. This key
area of Web services is where the work presented in this paper
has been developed. More precisely, the aim of this paper is
depict the building of an architecture for composing services
according to an abstract description of the process and the use
of semantic for annotating services.
The remainder of the paper is organized as follows. We
first give some related work about the problem of Web services
composition in section 2, then we introduce a motivating
scenario in section 3, next we present our solution in section 4.
Finally, we give conclusions and future work in section 5.
Previous work related to Web services composition have
taken approaches from the semi-automatic composition[8,9]
where a system is built to aid the user in the process of
composing Web services (using semantic information to filter
the available services and presenting only those that are
relevant); to the automatic composition of Web services where
the work is mostly focused on the view of the service
composition as a planning problem; thus the process is done
through the use of HTN[10,11,24], Golog[12-14], theorem
proving [15-18], ruled based planning[19,20], model
checking[21-23], Case Based Reasoning (CBR)[25],
Propositional Dynamic Logic based systems[26], classic AI
planning[27], etc.
The composition of services presents two main challenges,
one of them related to the orchestration of the services and the
other one related to the heterogeneity of the data. Although all
the solutions address the problem of the orchestration of
services, either aiding the user in the manual composition of
services (by filtering information) or defining complex
semantic structures (with preconditions and post-conditions
that characterize the goal that must be achieve by the
orchestration of services, and then using some of the mentioned
approaches to automatically create the composition process)
very few address the problem of data heterogeneity.
There are really few proposals that give support to the
industry standard for the composition of services (WSBPEL) in
an automated way, the work presented in this paper fills this
gap using semantic information.
WSBPEL is an XML-based process/workflow definition
execution language, it defines a model and a grammar for
describing the behavior of a business process based on
interactions between a process and its partners, these
interactions occur through the Web service interface of each
WSBPEL shows itself not flexible at all with the
underlying services it is orchestrating, changes on those
services will affect the orchestration defined in WSBPEL
making it unusable. Thus, there is a strong coupling between
the business process and the Web services it orchestrates.
Our work is focused on removing this coupling using
semantic information. The main advantage of our solution is
that it brings adaptability and fault tolerance to the industry
standard in the composition of services, providing some grade
of portability of business processes from one system to another.
WSBPEL is defined by two XML Schemas[2]:
Abstract: an abstract process is a partially
specified process, it is not intended to be
executed as it is. This type of process may hide
some information of the required concrete
Executable: an executable process is fully
specified and therefore it can be executed.
To decouple the business process from the underlying
services that are involved in it, the abstract definition of
WSBPEL is going to be used and the work will focus on how
to transform the abstract definition of a composition of services
into an executable one.
A WSBPEL document (which describes an orchestration of
Web services) is a sequence of steps where some of them
involve an operation of a Web service as can be seen (marked
with a red circle) on the following figure.
The abstract definition of a business process keeps the
workflow but removes all the links to the Web services
involved, making the business process independent of the
underlying services but unusable as it is. There is no possible
way to restore the original business process by hand, and to
accomplish it automatically, additional information is needed
both on the Web services description and the WSBPEL
This additional information is introduced both on the
services and the business process by extending its definitions
(WSDL and WSBPEL) with SAWSDL[28] annotations, which
reference concepts in an ontology. The main advantage of
SAWSDL is that it is independent of the ontology language
used, thus it is possible to use different formalisms according to
the needs of a particular domain.
On the side of the Web services, these SAWSDL
annotations[11] defines how to add semantic information to
describe several parts of the WSDL document such as input
and output messages structures, interfaces and operations. In
this work the attribute “modelReference” will be used on the
operations of the Web services to describe, semantically, which
is the goal they are able to achieve.
On the side of the WSBPEL document, this attribute will be
used on every step an operation from a Web service is involved
to specify the goal the operation is required to accomplish.
At this point, we have annotated Web services and an
annotated abstract business process that we need to translate
into an executable one before it can be usable. The translation
of the abstract process is done looking for the Web services
suitable to accomplish the goal required in each step where a
service is involved, thus the annotated available Web services
must be reachable somewhere where they can be searched
given a goal.
For this, we have developed a Composition Engine that is
one of the main components in the architecture of the Semantic
System shown in the ECOSPACE[30,31] project.
Figure 1. Simple BPEL diagram.
The WSBPEL relays on the WSDL description of Web
services to orchestrate them, but this information guarantees
only the syntactic interoperability among Web services and, in
several cases, this is not enough to ensure that a business
process is correctly assembled. Ideally, a business process
definition should describe the orchestration in terms of the kind
of Web services involved, rather than specifying concrete Web
Figure 2. Semantic System Architecture
The Composition Engine interacts with two main
components of this architecture.
The Discovery Repository is the component responsible for
storing the annotated Web Services descriptions and related
artifacts, e.g. SPARQL based pre and post-conditions.
WSBPEL document that describes the business process to
We would like to make a logical distinction between a
registry and repository to eliminate any confusion. The term
“registry” in its implementation refers to a metadata store; it is
analogous to a books catalogue which can be found in a library.
The term “repository” refers to the actual content that needs to
be stored in addition to its metadata. A repository is analogous
to actually book shelf in a library that stores all the books.
The registry and repository infrastructure represents mainly
three registry and repositories i.e. Service Registry, Service
repository and Ontology registry. Other applications and
architectural components (such as Semantic Service Discovery
Engine which will be detailed later) can locate the required
resources (i.e. service descriptions and ontologies) through
registry and repository infrastructure. Detailed discussion
about the Service Registry and Repository specification can be
found in the deliverable D3.2 of the ECOSPACE project[32].
Figure 3. Interface and Control Unit
The translation and execution of the annotated abstract
business process is carried out trough several steps.
The Dynamic Semantic Service Discovery (DSSD) is a
software component that implements dynamic discovery of
Web services, taking into account the preconditions and postconditions defined in their SAWSDL descriptions. The DSSD
comprehends two main subcomponents: the Semantic Registry
and the Discovery Agent.
The Semantic Registry of the DSSD acts as an internal
library of Web Services operations, and maintain specific data
structures holding the definition of preconditions, and the
descriptions of post-conditions. The Semantic Registry is
coupled with a traditional Registry, that in ECOSPACE
architecture is implemented by the Discovery Repository,
which holds the SAWSDL descriptions. The Semantic Registry
of the DSSD fetches SAWSDL descriptions from the
Discovery Repository, and process them in order to extract the
semantic information linked by the URIs in the
“modelReference” attributes (as describe above). The Semantic
Registry uses such each SPARQL CONSTRUCT queries to
build the RDF graph corresponding to the effects of the Web
Service operation, and store preconditions.
The Discovery Agent is a specialized software component;
it has a knowledge base (i.e. a formal description of some
information that is known to the agent), ant it accepts a goal
(i.e. the description of an objective). The Discovery Agent
searches a Web Service operation whose effects allows for the
achievement of the goal. The Discovery Agent interacts with
the Semantic Registry in order to explore the effects of Web
Services operations, and to verify the satisfiability of their
preconditions using the information contained in its knowledge
Further description is available in[33].
The Composition Engine has a web service interface and
offers the execution of an abstract business process (annotated
semantically) as if it were executable in a completely
transparent way.
Context information like a world description and an
invocation context is provided, as well as the abstract
Figure 4. Composition Engine, process overview
In the figure above these lines there is an overview of the
translation process. The “Composer” will be responsible of
driving the whole process dynamically adapting the behavior of
the Composition Engine depending on the context information.
The first stage of the translation is the analysis of the
annotated abstract business process. All the goals referenced in
the WSBPEL are extracted and used with the context
information to query the DSSD for suitable Web services. The
DSSD will use that information to look up in the registry where
the services are published.
The information collected in this stage is a list where every
goal is paired with the most suitable Web service that is able to
achieve it.
On the next stage that information is used to translate the
WSBPEL into an executable business process. This is the most
complex stage because of the flexibility of the WSBPEL
language, here, the descriptions of the Web services selected
are adapted to meet the syntactical requirements of the
WSBPEL in case they don’t meet them.
The last stage is where the executable process, obtained
before, is deployed in a BPEL Engine like ActiveBPEL[34] or
Glassfish[35] (which are the two BPEL Engines considered for
this development). Then, the business process is executed and
the Composition Engines returns the results back to the client.
This newly created business process must be undeployed
from the BPEL Engine after its execution because it is not
intended to have a lifespan beyond the execution requested to
the semantic system.
This paper presents an important contribution to solve a key
issue of the Composition of Services from the industry
promoted standard: the highly coupling between the business
process and the underlying services.
This work introduces adaptability to the composition of
services not only taking into account possible changes on them,
but also introduces the ability to select the most suitable
services depending on the context the business process is being
execute (i.e. based on costs, requirements, prohibitions, user
preferences…). It is providing context-awareness[37] to the
composition of services.
At this time the two first stages of the Composition Engine
are completed and the executable process result of the
translation at the second stage has been proved to work on
Glassfish. Our future work includes the implementation of the
last stage with the difficulty that every BPEL Engine has its
own custom artifacts that need to be created around the
WSBPEL in order to deploy the process and there is neither
API nor automatic way to do it programmatically, so a lot of
effort must be done to implement the last stage.
Abhijit Patil, Swapna Oundhakar, Amit Sheth, and Kunal Verma.
“Meteor-s web service annotation framework”. In Proceedings of the
13th International World Wide Web Conference, New York, USA, May
[2] OASIS, Web Services Business Process Execution Language Version
2.0, wsbpel-v2.0.pdf.
[3] W3C. Web Services Description Language (WSDL). Online:
[4] T. Bellwood et al. Universal Description, Discovery and Integration
specification (UDDI) 3.0. Online:
[5] WS-I, Web Services Interoperability Organization. Online:
[6] W3C, Web Services Choreography Description Language (WS-CDL).
[7] W3C, Web Services Choreography Interface (WSCI). Online:
[8] Evren Sirin, James Hendler and Bijan Parsia. “Semi-automatic
Composition of Web Services using Semantic Descriptions”.
[9] David Trastour, Claudio Bartolini and Javier Gonzalez-Castillo. “A
Semantic Web Approach to Service Description for Matchmaking of
[10] Sirin E., et al., HTN Planning for Web Service Composition Using
SHOP2. Web Semantics Journal. 2004. 1(4): p. 377-396.
[11] Sirin E., B. Parsia and J. Hendler. Template based composition of
semantic web services, in AAAI fall symp on agents and the semantic
web. 2005: Virginia, USA.
[12] Narayanan, S. and S.A. McIlraith. Stemulation, verification and
automated composition of Web services. In The 11th International World
Wide Web Conference. 2002. Honolulu, Hawaii, USA.
[13] McIlraith, S.A., T.C. Son, and H. Zeng, Semantic Web Services. IEEE
Intelligent Systems, 2001. 16(2): p. 46-53.
[14] McIlraith, S. and T.C. Son. Adapting Golog for composition of Semantic
Web services. In Knowledge Representation and Reasoning (KR2002).
2002. Toulouse, France.
[15] Waldinger, R.J., Web Agents Cooperating Deductively, in Proceedings
of the First International Workshop on Formal Approaches to AgentBased Systems-Revised Papers. 2001, Springer-Verlag.
[16] Lämmermann, S., Runtime Service Composition via Logic-Based
Program Synthesis, in Department of Microelectronics and Information
Technology. 2002. Royal Institute of Technology.
[17] Rao, J., P. Kungas, and M. Matskin. Application of Linear Logic to Web
Service Composition, in The 1st Intl. Conf. on Web Services. 2003.
[18] Rao, J., P. Kungas and M. Matskin. Logic-based Web services
composition: from service description to process model, in The 2004 Intl
Conf on Web Services. 2004. San Diego, USA.
[19] Ponnekanti, S.R. and A. Fox, SWORD: A Developer Toolkit for Web
Service Composition, in The 11th World Wide Web Conference 2002:
Honolulu, Hawaii. USA.
[20] Medjahed, B., A. Bouguettaya, and A.K. Elmagarmid. Composing Web
services on the Semantic Web. VLDB Journal. 2003. 12(4).
[21] Kuter, U., et al. A Hierarchical Task-Network Planner based on Sybolic
Model Checking, in The International Conference on Automated
Planning & Scheduling (ICAPS). 2005.
[22] Traverso, P. and M. Pistore. Automated Composition of Semantic Web
Services into Executable Processes, in The 3rd International Semantic
Web Conference (ISWC2004). 2004.
[23] Pistore, M., et al. Automated Synthesis of Composite BPEL4WS Web
Services, in IEEE Intl Conference on Web Services (ICWS’05).
[24] Massimo Paolucci, Katia Sycara and Takahiro Kawamura. Delivering
Semantic Web Services. WWW2003.
[25] Benchaphon Limthanmarhon and Yanchun Zhang. Web Service
Composition with Case-Based Reasoning, in 14th Australian Database
Conference 2003. Adelaide, Australia. Conferences in Research and
Practice in Information Technology, Vol. 17.
[26] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Richard Hull
and Massimo Mecella. Automatic Composition of Transition-based
Semantic Web Services with Messaging, in Proceedings of the 31st
VLDB Conference, Trondheim, Norway, 2005.
[27] Rao, J., et al., A Mixed Initiative Approach to Semantic Web Service
Discovery and Composition: SAP’s Guided Procedures Framework, in
The IEEE Intl Conf on Web Services (ICWS’06). 2006.
[28] Semantic Annotations for WSDL Working Group website,
[29] Kopeck, J.; Vitvar, T.; Bournez, C. and Farrell, J. (2007) SAWSDL:
Semantic Annotations for WSDL and XML Schema. IEEE Internet
Computing, 2007, 11, 60-67.
[30] ECOSPACE project,
[31] ECOSPACE Deliverable 3.7 “Final version of the augmented
[32] ECOSPACE Deliverable 3.2 “Middleware Open Interfaces and Service
Support Prototype”.
[33] Iqbal, K.; Sbodio, M. L.; Peristeras, V. and Guiliani, G. (2008) Semantic
Service Discovery using SAWSDL and SPARQL, Proceedings of the
SKG 2008, IEEE Press, 2008, (yet to appear)Bowman, M., Debray, S.
K., and Peterson, L. L. 1993. Reasoning about naming systems. ACM
[34] ActiveEndpoint, The ActiveBpel Community Edition BPEL Engine.
[35] Glassfish Community.
[36] W3C, SOAP specification.
[37] Anind K. Dey, Gregory D. Abowd. Towards a Better Understanding of
Context and Context-Awareness. GVU Technical Report GIT-GVU-9922, College of Computing, Georgia Institute of Technology, 1999.
International Workshop on
Distant Education Technology
(DET 2009)
Paolo Maresca, University Federico II, Napoli, Italy
Qun Jin, Waseda University, Japan
Eclipse: a new way to Mashup.
Paolo Maresca
Dip. Informatica e Sistemistica
Università di Napoli Federico II, Italy
[email protected]
Giuseppe Marco Scarfogliero
Università di Napoli Federico II, Italy
[email protected]
In our approach for designing enterprise solutions, there
is the need to realize some situational applications to
manage all the enterprise business processes that the
major Enterprise Applications cannot treat due to the
particularities of these processes. The specific nature of
these processes and their less relevance in the global
mission make them less attractive for software houses and
customers due to the high costs of designing and
development. So the need to find a solution which
guarantees low costs and short times of production. Our
interest resides in mashup applications and the web 2.0
capabilities. Our intent, with this paper, is to show the
eclipse platform as a very good solution to the problem of
designing e developing mashup applications, showing
which are the classical levels of a mashup application
and how eclipse platform can satisfy all mashup’s needs
thanks to its modular and flexible structure. In conclusion
we show also the aim the governs the eclipse community
and the constant rejuvenation process that gives us more
trust on the future possibility in this way.
1 Introduction
Today, the Information Technology scenario is having a
deep evolution, under the unceasing pressure of Market,
that every days shows new needs. This change is led by
technology evolution process, which offers innovative
business opportunities due to new discoveries.
The Software production sector for enterprises is
certainly one of the most interested scenarios by this
changing: next to the Enterprise Applications, developed
by IT as solution to the largest part of an enterprise
business problems, there is the need for Situational
Applications, software built ad hoc to manage particular
business processes linked to the different realities. Very
often the resources destined to the production of these
applications are limited, because of the lower relevance
Lidia Stanganelli
DIST - University of Genoa, Italy
[email protected]
that they have in the global mission. The tendency is to
adopt low quality software or to use non conventional
alternatives, using software built for other purposes to
achieve own goals.
The main difficulty to invest in the production of
software of this kind is in the “artistic” and “social”
nature of the business processes to model, in the sense
that their particularity and specificity do not allow their
implementation in Enterprise Applications.
So, the challenge is to provide very flexible, agile and
low cost methods and processes to develop Situational
Applications, in order to exploit the business opportunity
represented by the “Long Tail”.
The possibilities offered by web 2.0 technologies are
some of the most accredited solutions to this problem. In
this scenario Mashups have a great relevance.
2 Mashup
A mashup is a lightweight web application, which allows
users to remix informations and functions belonging to
different sources and to work with them to build
software in a completely new, simple and quick way.
The user can efficiently model their own business
process under the own vision of the problem, achieving a
result so particular and specific that is impossible to
obtain with the older technologies.
Mashups stands on the fundamental concept of data and
services integration; to operate in this way there are three
main primitives: Combination, Aggregation and
The first allows to collect data from heterogeneous
sources and to use them within the same application; the
second primitive allows to operate on collected data
having a measure and building new information starting
from them; the last is used to integrate data in a visual
way using maps or other multimedia objects.
In a technological view of the Mashup and of its data
and services integration problem, the natural
representation of the problem itself can be obtained
using a level/pyramidal approach.
Fig.01 – The Mashup Pyramid
Fig. 02 – The Eclipse Integration Pyramid
In the lowest abstraction layer there are Data Feeds and
web technologies involved by them. They represent a
good solution to access to updated data in a quick and
secure way.
In the immediately superior level live the APIs, used to
obtain data dynamically and on demand services.
A great level of abstraction is achieved by Code
Libraries, that can be thought as application frameworks
and API packages built to resolve some kinds of
On the Code Library level stands the Gui Tools level,
made of widgets and technologies related to the
composition of small graphical applications to show data
or to allow the access to a service.
On the top of the pyramid there’s the “Platform” level,
composed by all the tools and platforms that support
mashup applications building, allowing to compose
single graphical elements and lower level data.
The lowest level is “None – No Integration” which
represents the possibility to have no integration with
other external tools if this integration is not needed.
The “Invocation” level represents the integration
obtained by invocation of tools and services external to
Eclipse within the platform itself. Services are executed
as external processes distinct from the IDE one, using
the same eclipse resource manager to start them.
Platform gives the possibility to manage a tool-resource
association registry independent from the Operative
System one.
“Data” level is certainly the one that offers the greatest
level of integration. Eclipse platform, in fact, allows to
collect data from heterogeneous sources, to give them a
structure and to provide them to own applications, in a
coherent and very flexible way.
The “API” integration level graft perfectly on Data level.
The extreme flexibility of the Data level is balanced by
the need of decode, understand and maintain integrity of
Data. Using APIs allow to access data in a coherent,
secure and especially dynamic way, so the programmer
can release the burden of dealing of the explicit manage
of data. With APIs there is the introduction of the
concept of service, intended as an on demand action on
data. The modular structure of eclipse allows each
application to define its own APIs and services that
become usable by the platform itself and by its
On the top of the pyramid there is GUI integration level,
which allows many tools or application to share the
platform Graphical User Interface becoming an unique
application perfectly integrated in the IDE structure,
starting from different applications.
3 Eclipse
At this point it might be clear to the reader the
complexity of this model and the need to act on each of
pyramid levels in the application building process in
order to obtain a flexible and complete development
process of a mashup application. From here the need of
an integrated development environment, capable to adapt
itself to each kind of need thanks to its modular and
flexible architecture, allowing to face every aspect of the
mashup problem and that drives the developer through
all the production process till deployment and testing of
the final application.
These requisites are well satisfied by the Open Source
Development Platform “Eclipse”, that can greatly adapt
itself to every scenario thanks to its modular
architecture. Integrability is one of the main directives of
the Eclipse project from its birth: the platform
architecture allows 5 different integration levels as
represented in the following diagram.
4 Points of Convergence
There is a clear correspondence between the Mashup
pyramid levels and the Eclipse Integration Piramyd ones:
Fig. 03 – Corrispondence between pyramids
easiness of integration in the platform. Eclipse is a
unique environment in which realize the development of
the environment itself.
Last fundamental step is to bring the realized eclipse
mashup application on the web. Because of its genesis as
stand-alone software development tool, sometimes are
not clear the real possibility of eclipse in the web 2.0
field. There are many projects that allow the platform to
be accessible and usable from the web using a common
browser. Among all these projects one of the most
interesting is the “Eclifox” plugin developed as IBM
Alphawork; it makes available a remote eclipse instance
on the web through Jetty web server, transforming SWT
based GUIs on XUL based GUIs. XUL is the famous
language used by Mozilla products like Firefox. Another
important perspective is brought by the project “Rich
Ajax Platform” (RAP), that will be a component of
Eclipse Galileo having the maximum compatibility with
the platform. This Project allows to design Ajax
applications based on eclipse in a simple way very
similar to RCP Application building, substituting SWT
widget library with RWT built for web. So RAP is a very
good candidate to mashup application’s GUI building,
because the entire application is transformed in a web
2.0 application, using the common Java technologies for
server-side programming without the need for an eclipse
instance running on a server.
The “Data” level of Eclipse Integration Pyramid allows
to greatly manage Data Feeds, base of mashup pyramid,
extending this possibilities to all other structured Data
belonging to other sources like heterogeneous Databases.
This perspective appear very interesting in building
enterprise mashups, that realize the convergence
between data belonging to Enterprise Databases and data
belonging to web services external to own enterprise
Eclipse Galileo will offer many opportunities in this
way, including Data Tool Platform (DTP) project and
the famous Business Intelligence and Reporting Tool
(BIRT), that allow to collect and structure data using the
Open Data Access (ODA) framework, which realizes the
connection with the most common data sources: XML,
Web Services, CSV files and JDBC. Obtained data can
be easily managed by the known middle level eclipse
framewoks and be the base for EMF applications or
The “API” level allows to realize integration through
platform API and Plugins that compose the particular
installation. The modular structure of eclipse makes easy
to use external APIs or Code Libraries in a native
manner or managing them through particular plugin. A
famous example of the last possibility is offered by the
“gdata-java-client-eclipse-plugin” which, after installed,
gives the opportunity to easily create Java application
that uses the common Goolgle APIs. These possibilities
make the platform itself a natural candidate in realizing
the right integration required from Mashup’s “API” and
“Code Libraries” levels.
The “GUI” level is certainly one of the most powerful
and tested integration level in eclipse. The extreme
simplicity that characterizes the extension of the
personalization makes the platform adapt to realize any
kind of application, beginning from different
applications too, using perspectives, views and editors.
So, eclipse results to be the perfect environment in which
integrate mashup application widgets directly in its
architecture, with the whole flexibility, support and
5 Web Services
One of the most interesting data and services source for
mashups is represented by web services, because using
them allows to link services belonging to Enterprise
SOA and services belonging to an external WOA.
Actually service integration in eclipse is managed by
Data level through ODA drivers or by API level through
specific plugins. A new scenario will be born with
Galileo based on eclipse 3.5 that will furnish a major
support for web services.
Essentially WOA see figure 4, that is a subset of SOA,
describes a core set of Web protocols like HTTP and
plain XML as the most dynamic, scalable, and
Jazz products embody an innovative approach to
integration based on open, flexible services and Internet
architecture. Unlike the monolithic, closed products of
the past, Jazz is an open platform designed to support
any industry participant who wants to improve the
software lifecycle and break down walls between tools.
A portfolio of products designed to put the team first
The Jazz portfolio consists of a common platform and a
set of tools that enable all of the members of the
extended development team to collaborate more easily.
The newest Jazz offerings are:
• Rational Team Concert is a collaborative work
environment for developers, architects and project
managers with work item, source control, build
management, and iteration planning support. It supports
any process and includes agile planning templates for
Scrum and the Eclipse Way.
• Rational Quality Manager is a web-based test
management environment for decision makers and
quality professionals. It provides a customizable solution
for test planning, workflow control, tracking and
reporting capable of quantifying the impact of project
decisions on business objectives.
• Rational Requirements Composer is a requirements
definition solution that includes visual, easy-to-use
elicitation and definition capabilities. Requirements
Composer enables the capture and refinement of
business needs into unambiguous requirements that drive
improved quality, speed, and alignment.
Jazz is not only the traditional software development
community of practitioners helping practitioners. It is
also customers and community influencing the direction
of products through direct, early, and continuous
conversation. Fig. 5 shows Db2 on campus project
community monitored by using Jazz tools. The project
organization of the project was 130 students 4 thesis
student about, and was stimulated by using team concert
application. Jazz is also a process definition framework
including agile and personalized processes.
interoperable Web service approach. The only real
difference between traditional SOA and the concept of
WOA is that WOA advocates REST, an increasingly
popular, powerful, and simple method of leveraging
HTTP as a Web service in its own right.
some plain old XML to hold your data and state to top it
all off.
Fig. 04- SOA and WOA comparison architecture
WOA architecture emphasizes generality of interfaces
(UIs and APIs) to achieve global network effects through
five fundamental generic interface constraints:
1. Identification of resources
2. Manipulation
3. Self-descriptive messages
4. Hypermedia as the engine of application state
5. Application neutralità
This generalization enable us to match easily WOA
resources with Mashup Pyramid (see fig. 3).
6 Eclipse and Jazz
Another great advantage in using eclipse is the
convergence in act between the eclipse project and Jazz
platform: the introduction of Jazz candidates eclipse as a
complete tool which allows the collaborative
development and the managing of the whole software
life cycle. These innovations perfectly agree with
mashup philosophy.
Jazz is an IBM initiative to help make software delivery
teams more effective, Jazz transform software delivery
making it more collaborative, productive and
The Jazz initiative is composed of three elements:
- An architecture for lifecycle integration
- A portfolio of products designed to put the team first
- A community of stakeholders.
An architecture for lifecycle integration
Fig.05 – Db2 on campus project - Jazz
7 CityInformation: a mashup example
using BIRT
To underline the real possibilities of eclipse in mashup
developing, we show CityInformation, a simple example
on how eclipse BIRT project can be used to realize a mix
of data belonging to different data sources.
CityInformation shows to the user some information on
an user chosen American City in the form of a BIRT
HTML report.
When the application starts, it asks the user to insert the
name of the city to display information (Fig.06).
Fig.06 – Enter Parameters
Then the application invokes some free web services to
retrieve some information on the city:
The webservice WeatherForecast [cfr. Biblio12.]
supplies weather forecast information for all the week
and the geographic position of the city. Longitude and
Latitude are used to display the city map by Google
Maps using a mashup with an external website. Under
the map, forecast information are displayed grouped by
day, showing an image and the expected temperatures.
The Amazon webservice [cfr. Biblio14.] is used to
obtain a list of the most sold Travel Guides of the City
on; each book is displayed to the user with
its own cover image.
Fig.07 shows the report obtained requesting information
on the city of San Francisco.
8 Conclusions and future development
In this paper we showed our belief in mashups as
solution to Situational Application development and the
need of an integrated environment in which exploit all
the possibilities given by mashup philosophy. We
believe that eclipse platform is a very good candidate for
this purpose thanks to its modular and flexible
architecture that allows to manage every abstraction
level of the mashup pyramid in a simple way.
As future development we aim at integrating first and
second mashup pyramid with the corresponding two
Fig. 07 – City Information Report on San
eclipse levels. Facing with the next Galileo release of
eclipse that could be released by June 2009. A common
project is also growing grouping together Napoli and
Salerno University with IBM and their business partner
with the aims to research new mashup methodologies,
technologies and best practices. This collaboration is a
great opportunity to integrate knowledge belonging to
these different realities, mashing together open-source
solutions, university’s resources and technologies from
enterprises development environments, and to have the
possibility to prove that eclipse and mashups can be the
base on which build solutions to many problems of
modern enterprises and organizations.
10 Eclifox web site
11 Weather Forecast webservice:
12 Google Maps API:
13 Amazon webservices:
Maresca P. (2009)
La comunità eclipse italiana e la ricerca nel
web3.0: ruolo, esperienze e primi risultati
Slide show for Mashup meeting at University of
Raimondo M. (2009)
Web 2.0 Modelli Sociali, Modelli di Business e
IBM Slide show for Mashup meeting at University
of Salerno.
IBM developerWorks Mashup section
Duane Merril (2006)
Mashups: The new breed of Web app
IBM website
Jim Amsden
Levels of Integration: five ways you can integrate
with eclipse platform
Eclipse Rich Ajax Platform web site
Eclipse Data Tooling Platform web site
Eclipse Business Intelligence and Reporting Tool
web site
Mashup learning and learning communities
Luigi Colazzo, Andrea Molinari
Paolo Maresca
Lidia Stanganelli
Dip. Informatica e Studi Aziendali Dip. Informatica e Sistemistica DIST - University of Genoa, Italy
Università di Trento, Italy Università di Napoli Federico II, Italy [email protected]
[email protected];
[email protected]
[email protected]
solution to the largest part of an enterprise business
problems, there is the need for Situational Applications,
software built ad hoc to manage particular business
processes linked to the different realities. Very often the
resources destined to the production of these applications
are limited, because of the lower relevance that they have
in the global mission. The tendency is to adopt low quality
software or to use non conventional alternatives, using
software built for other purposes to achieve own goals
The web 2.0, when meeting the virtual communities (VC),
creates many issues when communities are closed, but have
a great potential if they take advantage of the inheritance
mechanism normally implemented in (advanced) virtual
communities systems. When a community platform is in
place, the system should provide a lot of basic services in
order to facilitate the interaction between community’s
members. However, every community has different needs,
every organization that implemements a VC platform needs
some special services, every now and then users or
organizations request new services.
The main difficulty to invest in the production of software
of this kind is in the “artistic” and “social” nature of the
business processes to model, in the sense that their
particularity and specificity do not allow their
implementation in Enterprise Applications.
So, the VC environment is very fertile in terms of
personalizations / evolutions / new developments,
especially in learning settings. In order to fulfill these
growing requests, the developers of e-learning applications
have different possibilities: a) build the personalization
“from scratch” b) create new web services for the new
requests c) using a mashup approach to respond to the
requests. In this paper, we will explore the promising
perspectives of the latter option. Mashup is an interesting
approach to new data / services development, and we will
investigate its perspectives in e-learning field. Mashup
seems to have a great appealing since it is devoted to the
reusing approach that is a typical job in VC ongoing.
So, the challenge is to provide very flexible, agile
and low cost methods and processes to develop Situational
Applications, in order to exploit the business opportunity
represented by the “Long Tail” [0].The possibilities
offered by web 2.0 technologies are some of the most
accredited solutions to this problem. In this scenario
Mashups have a great relevance.
A mashup is a lightweight web application,
which allows users to remix informations and functions
belonging to different sources and to work with them to
build software in a completely new, simple and quick way.
The users can efficiently model their own business process
under the own vision of the problem, achieving a result so
particular and specific that is impossible to obtain with the
older technologies.
1 Introduction
As a initial experiment we would like to discuss
following of mashup learning since this could be one of
the real case in which we need to adapt the learning
requirements to all user needs using a different concept
much more relates to web service than the most known
functional services offered by learning platform. Learning
platform, in several cases, could be viewed as “Hibernate
knowledge collection” from which students can learn
Today, the Information Technology scenario is having a
deep evolution, under the uncreasing pressure of Market,
that every days shows new needs. This change is led by
technology evolution process, which offers innovative
business opportunities due to new discoveries.
The Software production sector for enterprises is certainly
one of the most interested scenarios by this changing: next
to the Enterprise Applications, developed by IT as
represent a good solution to access to updated data in a
quick and secure way.
without adding their own perceptions. In a mashup
learning everyone can add his /her personal knowledge by
using simple mashup primitives in a mashup learning
In the immediately superior level live the APIs,
used to obtain data dynamically and on demand services.
A great level of abstraction is achieved by Code Libraries,
that can be thought as application frameworks and API
packages built to resolve some kinds of problems.
Next chapter will cover mashup primitives, Chapter 3
discuss about virtual communities and the mashup
tendency growing around such an environment. Chapter 4
will discuss first results and state the conclusions and
future development.
On the Code Library level stands the GUI Tools
level, made of widgets and technologies related to the
composition of small graphical applications to show data
or to allow the access to a service.
2 Mashup and Eclipse
On the top of the pyramid there’s the “Platform”
level, composed by all the tools and platforms that support
mashup applications building, allowing to compose single
graphical elements and lower level data. Model showed in
fig. 1 is complex but we have the possibility to operate at
each pyramid stage in order to build a flexible and
complete process. Obviously we need of a both complete
and flexible development process around a stable
technology as eclipse.
Mashups stands on the fundamental concept of data
and services integration; to operate in this way there are
three main primitives: Combination, Aggregation and
Visualization. First primitive allows to collect data from
heterogeneous sources and to use them within the same
application; the second primitive allows to operate on
collected data having a measure and building new
information starting from them; the last is used to integrate
data in a visual way using maps or other multimedia
At this point it might be clear to the reader the
complexity of this model and the need to act on each of
pyramid levels in the application building process in order
to obtain a flexible and complete development process of
a mashup application. From here the need of an integrated
development environment, capable to adapt itself to each
kind of need thanks to its modular and flexible
architecture, allowing to face every aspect of the mashup
problem and that drives the developer through all the
production process till deployment and testing of the final
In a technological view of the Mashup and of its data
and services integration problem, the natural
representation of the problem itself can be obtained using
a level/pyramidal approach (see fig. 1).
These requisites are well satisfied by the Open
Source Development Platform “Eclipse”, that can greatly
adapt itself to every scenario thanks to its modular
architecture. Integrability is one of the main directives of
the Eclipse project from its birth: the platform architecture
allows 5 different integration levels as represented in the
following diagram.
The lowest level is “None – No Integration” which
represents the possibility to have no integration with other
external tools if this integration is not needed.
The “Invocation” level represents the integration
obtained by invocation of tools and services external to
Eclipse within the platform itself. Services are executed as
external processes distinct from the IDE one, using the
same eclipse resource manager to start them. Platform
gives the possibility to manage a tool-resource association
registry independent from the Operative System one.
Fig.1 – The compared Mashup and Eclipse Pyramid
In the lowest abstraction layer there are Data
Feeds and web technologies involved by them. They
“Data” level is certainly the one that offers the greatest
level of integration. Eclipse platform, in fact, allows to
collect data from heterogeneous sources, to give them a
structure and to provide them to own applications, in a
coherent and very flexible way.
across the different branches of the whole communities
Moreover the philosophy is the one we use in a
typical open innovation network of users. In an open
innovation group an idea can rise and flow from a
community to another one allowing a major selfimprovement than a closed community can. An example
will clarify the thing. In an academic institution, virtual
communities normally can be created simply following the
traditional organizational structure of courses, i.e. (in
Italian university)
The “API” integration level graft perfectly on
Data level. The extreme flexibility of the Data level is
balanced by the need of decode, understand and maintain
integrity of Data. Using APIs allow to access data in a
coherent, secure and especially dynamic way, so the
programmer can release the burden of dealing of the
explicit manage of data. With APIs there is the
introduction of the concept of service, intended as an on
demand action on data. The modular structure of eclipse
allows each application to define its own APIs and
services that become usable by the platform itself and by
its components.
University – Faculty – Degree – Course ……
This means that we can have the course “DataBase”
that is part of the Master Degree in Computer science, that
is a community of the community “Faculty of
Engineering” that is a sub-community of the “University
of Trento” community. This structure has very interesting
properties for virtual communities, properties that are
typical of any hierarchy: inheritance, propagation,
multiple inheritance, polymorphism.
On the top of the pyramid there is GUI integration
level, which allows many tools or application to share the
platform Graphical User Interface becoming an unique
application perfectly integrated in the IDE structure,
starting from different applications.
In our virtual communities system, we have
another interesting property that is “trasversal
inheritance”. This means that a community under one
branch can inherit data, services or anything else from
another community in a different branch. Once again the
academic settings have a lot of these examples. Imagine
that the above course “Database” of the “Faculty of
Engineering” is held by the same teacher also for students
of another faculty. In our systems, this means that the
students enrolled in the second community should enroll
to the first, but that community is in a different branch
(Faculty) where normally they do not have access.
Trasversality among communities in different branches
allow us to create this effect.
3 Virtual communities
Virtual communities, when applied in organizations
(universities, companies, public administrations etc) have
a hierarchical structure in nature. This of course is not
exactly the typical idea of web 2.0, where contents can be
created and aggregated freely by people. A virtual
communities system applied to an organization normally
requests that the single community is a closed community,
where every member has been accepted by the community
administrator. This happened also in communities like the
ones built in our University, where initially, in the name of
free access to everyone, communities were opened.
However, after a while, it was clear that the community
(mostly associated with the metaphor “course”) should be
closed only to participants.
On this basis, the mashup ideas exposed above in
chapter 1 can offer interesting developments: imagine for
example the potentiality of a wiki, developed for the
community “database” above, that could be inherited by
the trasversal community of the same course held for the
students of the other faculty.
This structure of communities related with each other
in a hierarchy or in a net is by far more complex than a
“flat” architecture, where communities are sort of islands
in an archipelago, connected when and if they want. In a
virtual communities system like the one developed at the
University of Trento, communities are related because
they are part of a hierarchy (mostly determined by the
organization), but they can be related also trasversally
Another example is the typical situation where course
with high numbers of students are split into different subcourses, but of course the topics, the material, all the
services are shared among the different sub-course. If we
have a sub-community “database-a”, a subcommunity
“database-b”, all of them can create an wiki internal to the
sub community, but it would be very interesting to
aggregate these two wikis into one single wiki set at the
level of the parent community. This is a typical problem of
mashuping data coming from different communities that
have some hierarchical relationship between them.
Mashup learning
So the crucial questions are the following:
In e-learning field, the word “mashup” could evoke
different perspectives. The first association between the
two words probably has been made when the scientific
community started to talk about E-learning 2.0. E-learning
2.0 of course is strictly related with web 2.0 metaphor, and
the respective ideas of users participation in content
production, social networks, blogs, wikis etc. So, in the
world of e-learning, the closest thing to a social network is
a community of practice, where participarts promoted the
interaction and collaboration of people inside the
In this environment of cross-fertilization between
new web 2.0 tools and e-learning, the basic idea of
mashing up services and data finalizing them to
educational activities is pretty straightforward. Mashup
has also an appeal in terms of authoring environments,
where the teacher is able to mash up digital contents
originated from different sources.
Are there any potential applications for mashup
in e-learning / collaboration fields?
is the current mashup technologies ready for
allowing users to create their own mashups in elearning settings?
If not, what is missing to mashup philosophy to
become a “killer application”, or better approach
to e-learning development?
How can authors’ rights be identified and
protected in mashup-enabled environments?
On the other side, is it the time to shift from
closed innovation user network (web 1.0) to open
innovation user network (web 2.0 and 3.0) taking
advantage of the metaphor and tools available for
virtual communities?
How will service oriented architecture impact the
learning paradigm in the next future ?
Of course there is a general response that could
conclude the discussion: mashup is a very interesting and
promising approach, all the other difficulties will be
overtaken with time and market approval. Mashup editors,
like Yahoo Pipes and IBM Lotus Mashup Maker, are
available on the market (with different market strategies);
they allow end-users, even non-programmer end-users, to
mash-up information sources and services to build new
information available to satisfy their long-term, or
immediate information needs.
In general mashup “ideas” and e-learning are in
theory highly compatible: we believe therefore that the
following argumentations could be accepted as a starting
point for further studies:
taking advantage of mashup environments and
E-learning settings are even more requesting this
flexibility in creating / adapting / personalizing
services oriented to didactical activities.
Mashup general concept is very interesting and
promising in creating / integrating web
Mashup enable the open innovation user network
collaboration that is a fertile way to flow idea
and data from a community to another
Virtual communities are a particular fertile
settings for new services created and available
even to the specific and detail need of a single
community, and in situational application
Some development environments like Eclipse are
“philosophically” very close to the central ideas
of mashup.
E-learning settings are closely related with
mashup approach in the acquisition and
authoring of educational material. Teacher could
create new and media-enriched learning objects
Though we agree with this general claim, our first
experiments are showing some dark points, and some
clarifications that must be done in this area. For example,
on the side of tools, with the increasing number of
services and information sources available, and with the
complex operations that mashup tools tend to stimulate
(like filtering and joining), even an easy to use editor is
not sufficient [12].
First of all, it must be clarified who is the final user of
mashups in learning settings. Here follows a list of
possible users of this new paradigm, ordered from the one
more involved in technical operations (the programmer) to
the less technical user that could mashups some e-learning
services (the participant):
methodologies an tools. Since, mashup includes both
processes and products it implies, third area, new
distributed architecture systems as peer-to per or service
oriented and more prototyping tools as eclipse platform
and cooperation-collaboration, as jazz
the programmer, that will use enhanceddevelopment environments (like Eclipse) in order
to rapid develop mashup services from other
services already existing
• the administrator of the e-learning platform, that
will assemble some data or services extracted
from the e-learning platform based on request
• the teacher, that due to his/her specific didactical
needs, is allowed to use some mashup platform
(like Yahoo Pipes™) to create new services /
data for his/her activities
• the participant, that uses mashup techniques to
gather data or services for his/her educational
As you see, the panorama is very variegated, with
different level of involvement, technical complexity, final
objectives. In the case study we are using to understand
and deepen this topic, i.e. our Virtual community
platform, of course our first problem regards the
Conclusions and future development
In this paper we showed our belief in mashups as a
promising and new approach for e-learning settings,
specifically those that are more oriented to create a
collaborative environment, like Virtual Communities.
Mashup applications/environments/tools are interesting
from many different perspectives, from the perspective of
the producers of contents (teachers, or in web 2.0 settings,
the end user) to the producers of services / technologies
involved in e-learning (programmers, administrators,
teachers with particular innovative ideas). So mashup
could be the ideal solution to Situational Application
development, where we have a precise need of an
integrated environment in which exploiting all the
possibilities given by mashup philosophy.
We believe that, in the latest perspective, open
and innovative development platforms like Eclipse could
be the perfect candidate for this purpouse. In particular,
mashup environments require modular and flexible
architecture, allowing the users to manage every
abstraction level of the mashup pyramid in a very simple
The field is still in its infancy, there is a lot of
promising aspects but also dark points, especially from the
end-user perspective: mashup could also be seen as a land
of confusion, of unprecise construction, a fertile ground
for chaos in learning objects and learning services
For this reason, further studies are requested especially
from an experimental and technical point of view. For this
purpose, a common project is also growing grouping
together Napoli and Trento Universities with the aims to
research newest mashup methodologies, technologies and
best practices.
One great advantage of mashuping on these VC
systems regards, as mentioned, the possibility of creating
very quickly new services for the final users just
approaching with a mashup-enabled development
platform. This in some way resembles the times of
“software reuse”, “software as a component”, and for
some reason is contiguous to concepts like SaaS
approaches. The difference, anyway, are mainly in the
tools, in the general approach to the construction of new
services, in the technicalities that allow a mashup-enabled
platform to be efficient for programmers.
Regarding the last possible end-user of mashup elearning, i.e., non-technical users like teachers of
participants, of course this is at the moment more a dream
than a concrete perspective. What we would like to stress
is the potential of this approach. Imagine for example, the
general idea of mashup applied to educational material
construction, or in the creation of didactical paths that the
participants can build with an easy-to-use approach where
the contents are aggregated (graphically?) from different,
web-based content resourses.
The idea showed here could be strictly connected
with the process (or didactical path) that substain the
material. We mean when mashuping resources we could
mashup also the process that substain them. We need to
have Process Re-Engineering Process (PREP) as another
mashup experimentation area with their primitives
[0] Anderson C. (2004), “The Long Tail: Why the Future
of Business Is Selling Less of More” (ISBN 1-40130237-8).
[1] Maresca P. (2009) La comunità eclipse italiana e la
ricerca nel web3.0: ruolo, esperienze e primi risultati
Slide show for Mashup meeting at University of
[2] Raimondo M. (2009) Web 2.0 Modelli Sociali,
Modelli di Business e Tecnologici IBM Slide show
for Mashup meeting at University of Salerno.
[3] IBM
[4] Duane Merril (2006) Mashups: The new breed of
[5] IBM website
[6]Jim Amsden, Levels of Integration: five ways you can
[7] Eclipe
[8] Eclipe
[9] Eclipe Business Intelligence and Reporting Tool web
[10] Eclipe Rich Ajax Platform web site
[11] Eclifox web site
[12] Elmeleegy, H. Ivan, A. Akkiraju, R. Goodwin,
R., Mashup Advisor: A Recommendation Tool for
Mashup Development, in: ICWS '08. IEEE
International Conference on Web Services, 2008.,
Sept. 2008, Benjing, pp.337-344, ISBN: 978-0-76953310-0
[13] Marc Eisenstadt, "Does Elearning Have To Be So
Awful? (Time to Mashup or Shutup)," icalt, pp.6-10,
Seventh IEEE International Conference on Advanced
Learning Technologies (ICALT 2007), 2007
J-META: a language to describe software in Eclipse
Pierpaolo Di Bitonto(1), Paolo Maresca(2), Teresa Roselli(1), Veronica Rossano(1), Lidia Stanganelli(3)
Department of Computer Science – University of Bari
Via Orabona 4, 70125 Bari – Italy
Dipartimento di Informatica e Sistemistica – Università di Napoli “Federico II”
Via Claudio 21, Napoli - Italy
DIST - University of Genoa, Italy
Viale Causa 13, 16145, Genova, Italy
{dibitonto, roselli, rossano}, [email protected], [email protected]
Abstract— Information retrieval is one of the main activities in
different domains such as e-commerce, e-learning or document
management. Searching in large amount of data faces two main
problems: the suitability of the results with respect to the user’s
request, and the quantity of the results obtained. One of the most
popular solutions for this problem is to define more and more
effective description languages. In order to allow the search
engine to find the resource that best fit the user’s needs, that can
be very specific, detailed description are needed. Finding the
right level of the description is the current challenge, of the
researches in e-learning, e-commerce and document management
domains. For instance, a teacher can search a Learning Object
(LO) about a simulation of a chemical reaction, in order to enrich
his/her courseware. Thus, the LO description should not contain
only information about title, authors, time of fruition, and so on,
but should contain more specific information such as the type of
content, learning prerequisites and objectives, teaching strategy
implemented, students addressed, and so forth. In an open source
scenario, the problem is the same. In fact a developer needs to
find a software component, which must be integrated in existing
system architecture. In order to find the right component,
technical issues should be described. It will be interesting if the
open source web site supply an intelligent search engine which is
able to select the software according with the developer’s
requirements. It will possible only if detailed descriptions of each
resource are supplied. The paper proposes a description
language, named J-META, to describe software resources (plugin) in the Eclipse open community.
community; metadata
Information retrieval is one of the main activities in
different domains such as e-commerce, e-learning or document
management. Searching in large amount of data faces two main
problems: the suitability of the results with respect to the user’s
request, and the quantity of the results obtained. One of the
most popular solutions for this problem is to define more and
more effective description languages. Greater is the complexity
of the resource which should be searched, greater is the
complexity of the description. For example, in the WWW there
are search engines which index the resources using their
content (e.g. Google to describe web pages), and search
engines that need extra details, such as title, authors, time,
context, etc. (e.g. YouTube to describe multimedia resources).
In order to allow the search engine to find the resource that
best fit the user’s needs, that can be very specific, detailed
description are needed. Finding the right level of the
description is the current challenge of the researches in elearning and document management domains. For instance, a
teacher can search a Learning Object (LO) about a simulation
of a chemical reaction, in order to enrich his/her courseware.
Thus, the LO description should not contain only information
about title, authors, time of fruition, and so on, but should
contain more specific information such as the type of content,
learning prerequisites and objectives, teaching strategy
implemented, students addressed, and so forth.
In an open source scenario, the problem is the same. A
developer needs to find a software component, which must be
integrated in existing system architecture. In order to find the
right component, technical issues (such as goals,
functionalities, hardware and software requirements,
relationship among other components, the kind of licence, etc.)
should be described. Perhaps, these kind of information can be
contained in the software documentation (if it exists), but the
developer will spend a lot of time in reading and installing all
software available in an open source web site. It will be
interesting if the open source web site supply an intelligent
search engine which is able to select the software according
with the developer’s requirements. It will possible only if
detailed descriptions of each resource are supplied. Therefore,
in order to allow the search engine to supply the best suitable
resources for user’s requests, it is necessary to work on large
number of descriptors.
The paper proposes a description language, named JMETA, to describe software resources (plug-in) in the Eclipse
open community. Eclipse is an open source community, whose
projects are focused on building an open development platform
comprised of extensible frameworks, tools and runtimes for
building, deploying and managing software across the
lifecycle. The Eclipse Foundation is a not-for-profit, member
supported corporation that hosts the Eclipse projects and helps
cultivate both an open source community and an ecosystem of
complementary products and services [1].
Since 2001, the Eclipse community has been growing
quickly. Nowadays, the community counts: 180 members
including Public Administration, small and big companies,
universities and research centres; 66 software projects and the
Eclipse platform is used as development platform in more than
1300 products. Eclipse is leader in Java development
environment with about 2 thousand users over the world [2].
There are a lot of local communities and also an eclipse Italian
community that manages about 10 projects.
General: this includes all the general information that
describes the resource as a whole. The descriptors in
this group include: title, structure, aggregation level
Lifecycle: this groups the descriptors of any
subsequent versions of the LO and of its current state
Meta-metadata: these include the information on the
metadata themselves
Technical: this indicates the technical requisites needed
to run the LO and the technical characteristics of the
LO itself, such as the format or size of the file
Educational: this contains the pedagogical and
educational features of the LO. This is the most
important category and contains elements like:
Interactivity type, Learning resource type, Semantic
density, Typical learning time, that supply indications
on how the resource can be used in the teaching
Rights: this indicates the intellectual property rights
and any conditions of use of the LO, such as cost, as
well as the information on copyright.
Relation: this describes any relations (of the type: “is a
part of”, “requires”, “refers to”) with other LOs.
Annotation: this allows the insertion of comments
about the use of the LO in teaching, including an
identification of who wrote the annotation.
Classification: this makes it possible to classify the LO
according to a particular classification system.
The main idea of Eclipse community is to share ideas,
knowledge and experiences. In this context, the J-META
language presented herein, aims at defining a set of
specifications which allows Eclipse community members to
describe and find plug-in easily.
The state of art of metadata language has highlighted the
lack of languages to describe software resources, in particular
plug-in. So, to define J-META language it has been studied the
description languages defined in other domain such as elearning and librarian world in order to obtain useful guide
lines and suggestions to the J-META language definition.
The studied languages (Dublin Core [3], IEEE Learning
Object Metadata [4], Text Encoding Initiative [5], etc) have
pointed out some advantages and disadvantages related to the
description languages and their use in real contexts. For the
sake of simplicity it will be described just two of these
languages: Text Encoding Initiative (TEI) and IEEE Learning
Object Metadata (LOM).
TEI language was born in librarian world to describe
textual resources, and scholar text in particular; LOM language
was defined in e-learning context to describe didactic
The TEI standard has been developed many encoded data
sets, ranging from the works of individual authors to massive
collections of national, historical, and cultural literatures [5].
The TEI includes: (1) analysis and identification of categories
and features for encoding textual data at many levels of detail;
(2) specification of a set of general text structure definitions
that is effective, flexible, and extensible; (3) specification of a
method for in-file documentation of electronic text that is
compatible with library, cataloging conventions and can be
used to trace the history of the text and thus can assist in
authenticating their provenance and the modifications; (4)
specification of encoding convention for special kind text or
text features (character sets, general linguistics, dictionaries,
spoken texts, hypermedia).
The LOM standard is a set of IEEE specifications that
serves to describe teaching resources or their component parts.
It includes more than 80 descriptive elements subdivided into
the following 9 categories:
The analysis of different description languages has point
out two important suggestions for the definition of J-META. In
particular, the TEI suggested the level of detail for the
description of the plug-in and its each single component; the
LOM suggested the data model.
The state of art of metadata languages presented in the
previous section arises two questions about the most important
aspects for the description language definition: what kind of
software should be described? Which grain size must be
chosen to describe the software?
As concerning the kind of software that should be described
(and then searched), it is chosen as minimum unit the plug-in
because both functions (in the functional paradigm) and classes
(in object oriented paradigm) have too much references with
other functions or classes this makes difficult reuse the
software in different contexts. On the contrary, the plug-in is
independent from external code, in the worst case it depends
from other plug-ins. As concerning the grain-size of plug-in
descriptions it has been chosen a thin level describing functions
or classes that compose the plug-in in order to describe with
accuracy the software resources.
As figure 1 shown, J-META is composed by five main
General: the general information that describes the
plug-in as a whole;
Lifecycle: the features related to the history and the
current state of the plug-in;
characteristics of the plug-in;
Rights: the intellectual property rights and conditions
of use for the plug-in;
Code: the analysis and design diagrams and the code
description of the plug-in.
authors: the name (or the names) of plug-in
developer(s). It is an aggregate element that can
contains one or more elements (author) to list all the
authors involved;
description: a description of the plug-in content;
keywords: words (min 1 – max 10) that indicate
technique or technology used in the development of
plug-in (no controlled vocabulary is defined);
sector: area of pertinence of plug-in, it describes
the area of the service. The possible values are listed
in a controlled vocabulary and are those suggested by
the classification of plug-in central Eclipse site
(Application Management, Application server, Build
and Deploy, …)
annotations: gives to the users the possibility of
insert comments or recommendations about the use of
plug-in. It is an aggregate data with the following
children elements:
name: the name of the user that gives the
x object: object of the comment;
x comment: body of the comment;
x rating: user’s global evaluation of the plug-in
(expressed in numeric value from 1 up to 5)
download: the number of downloads of the plugin.
Figure 1. Main category of J-Meta language
Likewise, the LOM data model, J-Meta is a hierarchy of
data elements, including aggregate data elements and simple
data elements (leaf nodes of the hierarchy). The first level of
hierarchy is subdivided in subcategories which describe the
plug-in details. Each subcategories is composed by a number of
data elements, each of those aiming at describe a particular
plug-in issue. Only leaf nodes have individual values, for each
single data element, J-Meta defines:
Name: the name of the data element;
Explanation: the definition of the data element;
Size: the number of values allowed for the data
Values: the set of allowed values for the data element;
Type of data: indicates whether the values are String,
Date, Duration, Vocabulary or String or Undefined.
For the sake of simplicity the data elements of each main
category will be presented with a short description and
examples in order to clarify their use in a real context.
identifier: a globally unique label that
identifies the plug-in (for instance 0000AA, 00001,
title: the name of the plug-in;
Figure 2. General category of J-Meta language
B. Lifecycle category
Lifecycle category is composed by 4 data elements
A. General category
General category is composed by 8 data elements:
version: alphanumeric string (max 10 characters)
that indicates the version of the plug-in (for instance
3.2 or 3.2alpha)
state: the state of progress of the plug-in, the
possible values are (1) not-complete – the plug-in
project has just been started and it needs to be
completed, (2) draft – the plug-in is almost complete
but it has not been tested, (3) complete – the plug-in
has been released after a test phase, (4) neglected –
the plug-in is incomplete and the project closed.
releaseDate: the release date of the plug-in; the
format is mm/dd/yyyy;
contributes: aggregate element used to track
possible contributions or recommendations, like
changes or improvements from other developers.
Contributes element can have one or more children
(contribute), all identified by the attribute
contribute_id. Each contribute element has three
children elements: (1) contributor that
indicates the name of who modified the plug-in, (2)
data that indicates when contribute was released;
(3) description that describes the contribution.
requirements: the technical capabilities (hardware
and software) necessary for using the plug-in. Any
constraints is expressed by requirement element
(that can occur one or many times) that is a father
node; its child are (1) type that is the type of the
required technology to use the plug-in (i.e. hardware,
software, network, etc); (2) name that is the name of
the required technology to use the plug-in; (3)
minVersion that is the lowest possible version of
the required technology:
installationRemarks: describes how to install
the plug-in and the possible problems that could arise
and their solutions;
documentationRepository: pointers to link
web where to find good documentation about plug-in.
Web 2.0 tools are welcomed.
(res id)
(res id)
(contrib id)
Figure 3. Lifecycle category of j-meta language
C. Technical category
The technical category is composed by 6 main
requirement id
Figure 4. Technical category of j-meta language
size: numeric value that expresses in bytes the size of
the plug-in; this element will refer to the actual size of
the plug-in, not to the compressed one
requirement id
location: a unique resource identifier in the Web,
it may be a location (e.g., Universal Resource
Locator), or a method that resolves to a location (e.g.,
Universal Resource Identifier)
eclipseRel: the Eclipse release used to develop the
plug-in (it uses a controlled vocabulary with the
following values: Callisto, Europe, Ganimade,
dependency: the relationship between the plug-in
and other plug-ins, it has a single child element
(resource) that can occur one or more times, and
describes the related plug-in. Children of resource
element are: (1) name that identifies the name of the
related plug-in, (i.e. j-viewer); (2) reference that
indicates a unique resource identifier in the Web; (3)
description that explains the relations among
related plug-ins (i.e. the plug-in needs j-viewer to use
file .jar);
D. Rights category
The Rights category is composed by three data
cost: indicates if the use of plug-in requires any
payment (boolean values are only accepted);
licence: the kind of software licence. The
possible value are defined in a controlled vocabulary
that contains the existing open software licences
(GNU[6], ASL [7], BSD[8], …);
conditions of use.
Figure 5. Rights category of J-Meta language
*Comments is not a leaf node
Figure 6. Code category of J-Meta language
E. Code category
Code category is composed by 40 data elements grouped
in two main categories: progrApproaches, which
describes the programming approach used (declarative,
procedural, object oriented, functional, etc), and diagrams,
which describes the code from a high abstraction level using
UML diagrams that during analysis and design phases are
defined. Diagrams category is defined to help programmers to
better understand the source code. This is very important if a
programmer needs to modify and/or extend an existent plug-in
or needs to build a new plug-in that depends on an existing
ones. These situations are very common in an open source
community like Eclipse.From the source code point of view,
the plug-in is a complex object that can use different
programming paradigms, for instance Java code (procedural
paradigm) and Prolog code (declarative paradigm), or C code
(functional paradigm)
and Java code (object oriented
paradigm). This heterogeneity requires a flexible description
structure adaptable to different contexts.
For this reason the data element progrApproaches has
progrApproach element as child element that can occurs
one or more times; each progrApproach element has
declarative and procedural as children elements.
The declarative element describes the declarative approach
used for the plug-in development. In particular, it uses three
elements: (1) language that describes the programming
language used (for instance Prolog, Clips [9], etc.); (2)
inferEngine that specifies the inferential engine used for
the plug-in development (for instance SWI Prolog [10] or
SICStus prolog [11]); (3) description that describes the
goals of the code and how it works.
The procedural element describes the procedural
approach used for the plug-in development. In particular, it
objOriented according to the different programming
approaches. If functional paradigm is used, it is possible to
describe, using the functions element, all the functions
In particular, using the function element it is possible to
describe general information about the each single function and
other information about its organization. General information
(general) are: (1) name, the name of the function, (2)
author, (3) description, (4) keywords and (5)
comments. For each comment, it is possible to specify the
user who makes the comment, the object, the body and the
user’s global evaluation (from 1 up to 5). The information
about the organization of the function are described in:
functionScope, which indicates if the function is public or
private; inputData and outputData, which describe the
name and the type of input and output data of the function
If Object Oriented (OO) paradigm is used, J-meta can
describe the classes used in the plug-in. It is possible to
specify both general information such as (1) name, (2)
author, (3) description, (4) keywords, (5)
comments, and detailed information about the class and/or
the plug-in organization. For what concerning the class
organization it is possible to specify, using classScope, if
the class is public or private, and its own attributes and
methods. For each attribute, it is possible to describe the name
of the attribute, its scope (public or private) and a textual
explanation; for each method it is possible to specify the name,
the scope (public or private), the name and type of input and
output data and a short explanation.
elements (leaf nodes of the hierarchy). The first level of
hierarchy is subdivided in subcategories which describe the
plug-in details. Each subcategories is composed by a number
of data elements, each of those aiming at describe a particular
plug-in issue.
In the Eclipse Community does not exist any description
language for the plug-in, so the developers are forced to read
the different plug-in documentation in order to find the
resource that best fit to their needs. The introduction of J-Meta
will improve the search engine performances and the users’
The next step will be the validation of the J-META within
the Eclipse Italian Community, and then in the worldwide
Eclipse Community.
The paper proposes a description language, named J-META,
to describe software resources (plug-in) in the Eclipse open
community with the aim at improving the accuracy of the
search process. The main problems of the searching activities
in Eclipse Community are the large amount of plug-ins and
the complexity of the resources.
The precision of searching is strictly connected to the
description of the resource that will be searched. Better
detailed is the description of the resource, higher is the
precision of the search. On the basis of these premises the
language J-META has been defined. J-Meta allows to describe
the software plug-in from different points of view such as
goals, functionalities, hardware and software requirements,
relationship among other components, kind of licence, etc.
From the structural point of view, J-Meta is a hierarchy of data
elements, including aggregate data elements and simple data
Paolo Maresca. Project and goals for the Eclipse Italian Community.
(2008) International Workshop on Distance Education Technologies
(DET'2008), part of the 12th International Conference on Distributed
Multimedia Systems, Boston, USA, 4-6 September, 2008.
[3] Dublin Core Metadata Initiative (DCMI)
[4] IEEE
[5] Ide, N. and Sperberg-McQueen, C. The TEI: History, goals, and future.
Computers and the Humanities 29, 1 (1995), 5–15.
[6] GNU General Public License
[7] Apache License, Version 2.0
[8] Open Source Initiative OSI - The BSD License:Licensing
[9] Clips
[10] SWI Prolog
[11] Sicstus Prolog
Providing Instructional Guidance with IMS-LD in
COALA, an ITS for Computer Programming
Francisco Jurado, Miguel A. Redondo, Manuel Ortega
University of Castilla-La Mancha
Computer Science and Engineering Faculty
Paseo de la Universidad 4
13071 Ciudad Real, Spain
+34 295300
{Francisco.Jurado, Miguel.Redondo, Manuel.Ortega}
Abstract—Programming is an important competence for the
students of Computer Science. These students must acquire
knowledge and abilities for solving problems and it is widely
accepted that the best way is learning by doing. On the other
hand, computer programming is a good research field where
students should be assisted by an Intelligent Tutoring System
(ITS) that guides them in their learning process. In this paper, we
will present how we have provided guidance in COALA, our ITS
for programming learning, merging Fuzzy Logic and IMS
Learning Design.
Keywords-Problem Based Learning, Intelligent
Adaptive environments, Instructional planning
Obtaining Computer Programming competence implies that
students of Computer Science should acquire and develop
several abilities and aptitudes. It looks to be widely accepted
that the best way to acquire that competence is learning by
doing [1]. Therefore, students must solve programming
problems presented strategically by the teacher. These students
use computers in order to develop the learning activities which
the teacher has specified. Thus, we think this makes it an ideal
environment for Computer Assisted Learning (CAL), where
students are assisted by an Intelligent Tutoring System (ITS)
that guides them in their learning process, helping them to
improve and to acquire the abilities and aptitudes they should
acquire and develop, leaving behind the slow trial and error
An ITS allows adapting the learning process to each
student. For this, the ITS is based on determining what the
student cognitive model is, so it can determine the next
learning activity for each specific student. ITS are usually used
together with Adaptive Hypermedia Systems (AHS) for
providing “intelligent” navigation through the educative
material and learning activities. These systems that merge ITS
and AHS are the so-called Adaptive Intelligent Educational
Systems (AIES). Several examples of systems that integrate
AHS and ITS for programming learning can be found in EMLART [4], Interbook 3, KBS-Hyperbook [13] or AHA! [5].
On the other hand, the growth of Web-Based Education
(WBE) environments has made groups such as IEEE LTSC,
IMS Global Learning Consortium, or ADL, work to provide a
set of standards allowing reusability and interoperability into
the eLearning industry, setting the standard base where
engineers and developers must work to achieve eLearning
systems integration.
One of those specifications is IMS Learning Design (IMSLD) [8] proposed with the aim of centering on cognitive
characteristics and in the learning process, allowing isolating
the learning design process from the learning object
In this paper, we will show how we have extended our
distributed environment for Programming Learning called
COALA (Computer Assisted Algorithm Learning), integrating
an IMS-LD engine into it. This approach merges, on the one
hand, AHS techniques used in the system mentioned
previously by using eLearning standards specification, and on
the other hand, Artificial Intelligent Techniques for enabling
ITS to lead students to achieve the abilities and aptitudes they
need for their future work.
The paper is structured as follows: firstly, an overview
about what we must take into account for providing
instructional adaption (section 2); secondly, an explanation of
our instructional model (section 3); then, the assessment or
evaluation service the system uses will be shown (section 4);
next, we will go deeply into some implementations issues and
how the system works (section 5); finally, some concluding
remarks will be made (section 6).
Our aim is to provide an approximation that allows the
creation of ITS considering student cognitive model and the
instructional strategy needed to teach the lesson. In this sense,
it is necessary to use techniques from AHS, as summarized in
Brusilovsky’s taxonomy [2]:
problem. This must provide a mechanism for managing the
imprecision and vagueness with which both teacher and student
specify the solution.
This artifact model, which analyzes the solution, interacts
with the student cognitive model for updating it, reflecting the
evidence of knowledge that has given shape to the solution
developed by the student. In this way, in accordance with the
student’s work, the instructional adaptation can be achieved,
deciding the next learning activity to be proposed. Thus, the
learning activities are shown as a consequence of how the
student solves problems.
Figure 1. Models in our Approach.
A user model based on concepts: this consists of a set
of concepts with attributes such as the degree of
knowledge. Then, for instance, in AHA! [5], visiting a
webpage implies incrementing the knowledge attribute
for the concept dealt with in that webpage. Also,
updating the knowledge attribute can be propagated to
other concepts.
Adaptive link hiding: this means that a set of Boolean
expressions can be defined based on values from the
user model. With this, the showing and hiding of a link
can be evaluated.
Conditional inclusion of fragments: this introduces a
set of conditional blocks that allows the appearance or
not of text fragments.
In instructional design, we must provide learning activities
sequencing. Thus, Brusilovsky’s taxonomy will be adopted in
our proposal, defining some models which are: the student
cognitive model, the instructional model and the artifact model
On the top left-hand corner of the figure 1, we can see the
student cognitive model. It consists of a set of evaluations for
each task the student has to solve and represents the cognitive
stage for the student at every moment. This matches the user
model, based on the concept taken from Brusilovsky’s
taxonomy. In other words, it specifies what parts of the domain
the student knows and to what degree.
On the top right-hand corner, the figure shows the
instructional model. This model allows specifying the
instructional strategy or learning design to be applied. In other
words, the instructional model represents the learning activity
flow. It will be adapted depending on the evaluations stored in
the student cognitive model. This matches the adaptive link
hiding and conditional inclusion fragments from Brusilovsky’s
taxonomy. Thus, learning activities substitute links and
fragments, for example reading a text, designing quizzes,
multimedia simulation, chats, etc.
So, for implementations proposal, we use IMS-LD in our
instructional model, and Fuzzy Logic [16] in our artifact model
for the evaluation process [9] [10]. In the following sections,
we will show in detail how we have implemented these models
into our system.
As we have previously stated, we need learning activity
sequencing to set our instructional model. Since our aim is to
develop an ITS that allows applying instructional strategies
according to the subject to be learned/taught, we propose the
use of IMS-LD [8] to specify the method that must be used in
the teaching/learning process [14], that is, to specify the
instructional adaptation model.
IMS-LD can be used for developing adaptive learning (AL)
[15]. This is because an IMS-LD specification can be enriched
with a collection of variables from the student profile. These
variables allow specifying conditions to determine if a learning
activity or a branch of the learning flow (a set of learning
activities) is shown to or hidden from a specific student. It can
be done in the following way: each method has a section for
defining conditions that points out how it must adapt itself to
specific situations through rules like those shown in figure 2. In
that example code, if the student knowledge about a concept is
less than 5, then the activity A1 is hidden and the activity A2 is
shown; in the opposite case, activity A1 is shown and activity
A2 is hidden.
In our system, the variables used for defining the adaptation
rules in the condition section of an instructional method are
obtained from the student model. In our case of study
(programming learning), the evidence must obtain its value
from the artifact (algorithm) developed by the student. In the
next section, we explain how to evaluate the algorithms that
students design as a result of the learning activities and how
their cognitive model is updated.
IF student knowledge less-than 5
THEN hide activity A1 and show activity A2.
Among the learning activities, a problem to be solved with
an algorithm can appear. These algorithms should be analyzed,
assessed and evaluated. Then, to support this, a model that
manages the solution must be considered. This will be the
artifact model shown in the bottom right-hand corner of figure
1. This model allows supporting processing and analyzing
artifacts (algorithms) developed as a solution for a proposed
ELSE show activity A1 and hide activity A2.
Figure 2. Example of the Rule for Adapting an Instructional Design.
we get a fuzzy representation of that ideal approximated
algorithm, that is, we obtain an ideal approximated algorithm
fuzzy representation that solves a concrete problem (at the top
of figure 3).
Algorithms that students have written (on the right of the
figure) will be correct if they are instances of that ideal
algorithm fuzzy representation. Knowing the degree of
membership for each software metric, obtained from the
algorithm written by students in the correspondent fuzzy set for
the ideal approximated algorithm fuzzy representation, will
give us an idea of the quality of the algorithm that students
have developed.
With this method, we have an artifact model that manages
imprecision and vagueness; furthermore, it is based on solid
engineering practice (software engineering).
Figure 3. Evaluating the Student Algorithm.
Up to this point, we have explained how a generic ITS can
be implemented, taking into account the student model and the
instructional model which must be followed. In our case, we
want to apply this ITS to a programming learning environment.
Thus, the learning activities that the IMS-LD player will show
to students can be programming problems. So, a model that
evaluates the algorithm delivered as a solution is necessary.
Our proposal is explained in [9] [10] and briefly shown in
figure 3. In this figure, the teacher writes an implementation for
the ideal approximate algorithm that solves a problem (on the
bottom left of the figure). Next, several software metrics that
shape its functionality will be calculated. In this way, we obtain
an instance of the ideal approximated algorithm. After that, the
fuzzy set for each metric will be established in the following
way: initially, each fuzzy set will be a default trapezoidal
function around the metric value from the approximate
algorithm; he teacher can adapt each fuzzy set for indicating
whether an algorithm is correct or not. From this, we obtain a
collection of fuzzy sets that characterize the algorithm. Thus,
Thus, the system will have the evaluation of the algorithm
developed by the student as feedback. This can be used by the
teacher for re-writing or adapting both the learning design and
the approximated ideal fuzzy representation of the algorithm in
order to improve the system.
For the implementation, we have taken COALA (Computer
Assisted Environment for Learning Algorithms) as a starting
point [12]. COALA has been developed as a customized
Eclipse application by means of plug-ins. It is an Integrated
Development Environment (IDE) that is not so different from
the one that students will find in their future work. That is, it
doesn’t use virtual environments or simulation tools, but
employs a real-world IDE. COALA allows the distribution of
programming tasks or assignments to students, the
asynchronous downloading of such assignments, local
elaboration, uploading, annotation and feedback to teachers
and students.
Figure 4. IMS-LD as a Service.
Figure 5. Customized Eclipse Framework.
As a communication engine, COALA implements
Blackboard architecture by using a Tuple Space (TS) server. A
TS server is basically a shared memory in which clients can
store and retrieve information formatted as tuples [6]. The
COALA plug-in for the Eclipse environment allows
communication by means of a TS implementation called
SQLSpaces, developed at the University of Duisburg-Essen
To provide the corresponding guidance, we have chosen the
main LD engine which can be found nowadays: CopperCore.
As communication middleware, CopperCore uses Enterprise
Java Beans (EJB) or Web Services (WS). So we have
implemented a proxy that translates the necessary API and
allows communication with our Tuple Space server.
Following our explanation, figure 4 shows the different
steps and messages (tuples) between the teacher, the students,
the evaluator module and the LD engine by means of the TS
server. So, as we can see in figure 4, in the beginning, the
teacher specifies an assignment using his/her COALA
environment (figure 5). After this, the teacher uploads this
assignment by sending a tuple with the form <task id, task
description, test cases, fuzzy representation> (step 1) using the
“Send Task to TS” action in the plug-in. At that moment, the
task is available to all the students in the classroom.
The tasks the teacher had uploaded must be those specified
in the corresponding IMS-LD which has been previously
loaded in CopperCore. On the other hand, the LD engine uses
the proxy to talk to the TS server making use of the tasks stored
on it. Then, CopperCore can send the TS server a tuple with the
form <user id, run id, activity tree> which contains the
corresponding activity tree extracted from the IMS-LD
specification for each student (step 2). At this time, all the tasks
and the corresponding activity trees for each student are
Therefore, the students can, first of all, download their
activity tree specification using the “Learning design” view in
their COALA environment (figure 5, on the left) reading the
<user id, run id, activity tree> tuple (step 3). Secondly,
following this activity tree, the students are able to download
the corresponding assignment onto their workspace reading the
tuple <task id, task description, test cases, fuzzy
representation> previously uploaded by the teacher (step 4),
using their “Download Task from TS” action menu in the plugin. Then, each student can work out the task by writing the
code, compiling, etc.
Once the students have finished the assignment, they can
send their results to the server from where they can be
downloaded and reviewed by the teacher. Students upload the
solution to the server sending a tuple with the following
content: <use id; task id; solution code> (step 5). The teacher
will be notified about the task sent, and can check the code
written by the student on his/her computer by reading all the
tuples with the form <use id; task id; solution code> (step 6)
from the server. Now the teacher can see the task in his/her
“Notification view” (figure 5, on the bottom right).
As previously mentioned, we have implemented an
evaluator module that reads the tuples the students have sent,
that is, the same ones the teacher reads (step 6), and processes
the code for obtaining a set of metrics and an evaluation
explanation (as presented in section 3). These calculated
metrics are sent to the Tuple Space server with the form <task
id; user id; metric1; metric2; ... metricN>. Also, an explanation
associated with each metric is sent in a tuple with the following
format: <task id; user id; explain metric1; explain metric2; ...
explain metricN> (step 7). Then, both the teacher and the
students can read the software metrics and the corresponding
explanations from the server and analyze them (step 8). So,
during their programming, students can use the tests created by
the teacher and ask the system for an automatic evaluation to
check their solution (figure 5, on the bottom left).
At the same time, (step 8) the proxy reads the evaluation for
the task and informs CopperCore that a property has changed
for a user. Then, CopperCore processes this change and
updates the activity tree for the concrete user. The update in the
activity tree tuple fires a notification to the student COALA
instance. This notification informs COALA that the new
activity tree is available and it will be downloaded as in step 4.
So, the student can follow his/her activity tree and download
the next task.
Throughout this paper, we have shown how we have
created an ITS by merging techniques from the AHS and AI
techniques. The paper starts by analyzing the necessary
techniques from the AHS. We have adopted these AHS
techniques by means of a set of models which are: the student
cognitive model for determining which parts of the domain the
student knows, the instructional model for adapting the
learning activities flow depending on the knowledge the
student has, and the artifact model for evaluating the students’
solutions to assignments. This last model will update the
student cognitive model as a consequence of the learning
activity flow.
So, starting from our distributed environment, called
COALA (Computer Assisted Environment for Learning
Algorithms), which enables the distribution, monitoring
assessment, and evaluation of assignments, we have shown
how we have added, without difficulty, CopperCore as an IMSLD engine. This was possible by integrating a new component
in its blackboard distributed architecture, proving the extension
capabilities of our architecture.
Thus, we have an ITS that allows adapting the learning
process to each student, taking into account the results of the
delivered assignment. So, as future work we intend to test the
system in scenarios where an adaptation is needed, and then
check if the system provides the correct one.
This research work has been partially supported by the
Junta de Comunidades of Castilla-La Mancha, Spain, through
the projects AULA-T (PBI08-0069) and M-GUIDE
(TC20080552) and Ministerio de Educación y Ciencia, Spain,
through the project E-Club-II (TIN-2005-08945-C06-04).
Bonar, J. & Soloway, E., “Uncovering Principles of Novice
Programming”, in POPL '83: Proceedings of the 10th ACM SIGACTSIGPLAN Symposium on Principles of Programming Languages, ACM,
1983, pp. 10-13.
Brusilovsy, P., “User Modeling and User-Adapted Interaction”, in
Adaptative Hypermedia Vol. 11, nr. 1-2, Kluwer Academic Publisher,
2001, pp. 87-110
Brusilovsy, P., Eklund, J., and Schwarz, E., “Web-Based Education for
All: A Tool for Developing Adaptative Courseware”, in Proceedings of
Seventh Internacional World Wide Web Conference, 1998, pp. 291-300
Brusilovsky, P., Schwarz, E.W. and Weber, G., “ELM-ART: An
Intelligent Tutoring System on World Wide Web”, in Intelligent
Tutoring Systems, 1996, pp. 261-269
De Bra, P., Ad Aerts, Bart Berden, Barend de Lange, Brendan Rousseau,
“AHA! The Adaptative Hypermedia Architecture”, in Proceeding of
HT’03, 2003
Gelernter, D., “Generative Communication in Linda”, ACM
Transactions on Programming Languages and Systems, 7(1), 1985, pp
Giemza, A., Weinbrenner, S., Engler, J., Hoppe, H.U., “Tuple Spaces as
Flexible Integration Platform for Distributed Learning Environments”,
in Proceedings of ICCE 2007, Hiroshima (Japan), November 2007. pp.
IMS-LD, “IMS Learning Design. Information Model, Best Practice and
Implementation Guide, XML Binding, Schemas. Version 1.0 Final
Specification”, Technical report, IMS Global Learning Consortium Inc,
Online,, 2003
Jurado, F.; Redondo, M. A. & Ortega, M., “Fuzzy Algorithm
Representation for its Application in Intelligent Tutoring Systems for the
Learning of Programming”, in Rogério PC do Nascimento; Amine
Gerqia; Patricio Serendero & Eduardo Carrillo (ed.), EuroAmerican
Conference On Telematics and Information Systems, EATIS'07 ACMDL Proceeding, Association for Computing Machinery, Inc (ACM),
Faro, Portugal, 2007
Jurado, F.; Redondo, M. A. & Ortega, M., “Applying Approximate
Reasoning Techniques for the Assessment of Algorithms in Intelligent
Tutoring Systems for Learning Programming” (in Spanish), in Isabel
Fernandez de Castro (ed.), VII Simposio Nacional de Tecnologías de la
Información y las Comunicaciones en la Educación (Sintice'07),
Thomson, Zaragoza, Spain, 2007, pp. 145-153
Jurado, F.; Redondo, M. A. & Ortega, M., “An Architecture to Support
Programming Algorithm Learning by Problem Solving”, in Emilio
Corchado; Juan M.Corchado & Ajith Abraham, (ed.), Innovations in
Hybrid Intelligent Systems, Proceedings of Hybrid Artificial Intelligent
Systems (HAIS07), Springer Berlin Heidelberg New York, Salamanca,
Spain, 2007, pp. 470-477
Jurado, F.; Molina, A. I.; Redondo, M. A.; Ortega, M.; Giezma, A.;
Bollen, L. & Hoppe, H. U., “COALA: Computer Assisted Environment
for Learning Algorithms”, in J. Ángel Velázquez-Iturbide; Francisco
García & Ana B. Gil, (ed.), X Simposio Internacional de Informática
Educativa (SIIE'2008), Ediciones Universidad Salamanca, Salamanca
(España), 2008
Nejdl, W. and Wolper, M., “KBS Hyperbook—A Data Driven
Information System on the Web”, in WWW8 Conference, Toronto, 1999
Oser, F.K. and Baeriswyl, F.J., “Choreographies of Teaching: Bridging
Instruction to Learning”, in Richardson, V. (ed.), Handbook of Research
on Teaching. 4th Edition. McMillan - New York, 2000, p. 1031-1065
Towel, B. and Halm, M., Learning Design: A Handbook on Modelling
and Delivering Networked Education and Training, Springer-Verlag,
chapter 12 - Designing Adaptive Learning Environments with Learning
Design, 2005, pp. 215-226
Zadeh, L. , “Fuzzy Sets”, in Information and Control, Vol. 8, 1965, pp.
Deriving adaptive fuzzy learner models for Learning-Object recommendation
G. Castellano, C. Castiello, D. Dell’Agnello, C. Mencar, M.A. Torsello
Computer Science Department
University of Bari
Via Orabona, 4 - 70126 Bari, Italy
castellano, castiello, mencar, danilodellagnello, [email protected]
also known as adaptive e-learning systems, personalization
plays a central role devoted to tailor learning contents according to the specific interests of learners in order to provide highly personalized learning sessions [4]. To achieve
this aim, individuality of each learner has to be taken into
account so as to derive a learner model that encodes his/her
characteristics and preferences. The derived learner model
can be successively exploited to select, among the variety of available Learning-Objects (LOs), those that match
the interests of the individual learner. Therefore, in order
to develop an adaptive e-learning system, two main activities should be carried out: (i) the automatic derivation of
learner models starting from the information characterizing
the preferences of each learner and (ii) the recommendation
of LOs on the basis of the learner model previously derived.
Typically, learner models are derived through the analysis of the navigational behavior that each learner exhibits
during his/her interactions with the system. Obviously, the
interests and needs of learners may evolve during the learning process. This is an important aspect that has to be
considered in order to derive learner models that may be
adapted over time so as to capture the changing needs of
each learner [3].
In addition, learner preferences are heavily permeated by
imprecision and gradedness. In fact, learner interests have a
granular nature for which they cannot be referred to specific
LOs but, rather, they cover a range of LOs somehow similar (e.g. typically a learner may prefer one or more LOs
about similar or related topics). Moreover, learner characteristics apply to learning resources with graduality, that is,
a characteristic applies to a LO on the basis of a compatibility degree. In other words, there is a compatibility degree
between learner preferences and LOs which may vary gradually. As an example, the interest of a learner for “Computer
Science” may apply to a LO about “Web” and to a LO concerning “Computer Architecture” with different compatibility degrees.
A mathematical framework suitable to represent and
handle such imprecision and gradedness is Fuzzy Set The-
Adaptive e-learning systems are growing in popularity in
recent years. These systems can offer personalized learning experiences to learners, by supplying each learner with
learning contents that meet his/her specific interests and
needs. The efficacy of such systems is strictly related to
the possibility of automatically deriving models encoding
the preferences of each learner, analyzing their navigational behavior during their interactions with the system.
Since learner preferences may change over time, there is
the need to define mechanisms of dynamic adaptation of
the learner models so as to capture the changing learner
interests. Moreover, learner preferences are characterized
by imprecision and gradedness. Fuzzy Set Theory provides
useful tools to deal with these characteristics. In this paper
a novel strategy is presented to derive and update learner
models by encoding preferences of each individual learner
in terms of fuzzy sets. Based on this strategy, adaptation is
continuously performed, but in earlier stages it is more sensitive to updates (plastic phase) while in later stages it is
less sensitive (stable phase) to allow Learning-Object suggestion. Simulation results are reported to show the effectiveness of the proposed approach.
1 Introduction
In the age of knowledge, e-learning represents the most
important and revolutionary way to provide educational services at any time and place. Today, in each kind of learning environment, the learner covers a key role: he became
the main protagonist of his learning pathway opening a new
challenge for current systems that have to necessarily adapt
their services to suit the variety of learner needs [1]. This
trend has led to the development of user-centred e-learning
systems where the main aim is to maximize the effectiveness of learning by supplying an individual learner with personalized learning material [5, 6]. In this kind of systems,
ory (FST) [8, 10], based on the idea of fuzzy sets, that are
basic elements suitable for representing imprecise and gradual concepts. FST provides fuzzy operators that can be used
to combine, aggregate and infer knowledge from fuzzy sets.
In this work, we propose a strategy that derives learner
models representing learner preferences of each learner in
terms of fuzzy sets. The strategy is able to dynamically
adapt models to the changing learner preferences so as to
recommend similar LOs at the next accesses of a learner.
The adaptation of learner models is performed continuously
via a process that comprises two phases: a plastic phase,
during which the adaptation process is more sensitive to
updates, and a stable phase, in which adaptation is less sensitive so as to allow LO recommendation. The two-phase
adaptation process guarantees the convergence to a learner
model that can be used to suggest new LOs that are compatible with the specific learner preferences.
The paper is organized as follows. In section 2 the approach proposed for modeling learners is briefly described,
along with the basic mechanism used to associate LOs to
learners, according to their preferences. In section 3 the
strategy for adaptation of learner models formalized. Section 4 shows some simulation results to prove the effectiveness of the proposed approach. Finally, section 5 closes the
paper by drawing some conclusions.
can formalize the following kinds of attributes:
• attributes with crisp values, such as the Dimension (expressed in KB) of a LO;
• attributes with collective values, such as the Topic of a
LO, which can assume categorical values (e.g. “Computer Science”, “Economy”, “Business”, ... );
• attributes with imprecise values, such as the Fruition
Time required by a LO, which can be expressed by a
vague concepts such as LOW, MEDIUM or HIGH.
One key feature of the proposed model is the possibility to
easily formalize imprecise properties, thus favoring a mechanism of information processing that is in agreement with
human reasoning schemes [9].
In the following subsections, the description of both LOs
and learner models is detailed.
2.1 Description of Learning-Objects
Each LO is defined by a collection of fuzzy metadata,
i.e. a set of couples <attribute, f value> where attribute
is a string denoting the name of an attribute and f value
is a fuzzy set defined on the domain of the attribute. An
example of fuzzy metadata is:
2 The proposed approach
Complexity, {Low/1.0, M edium/0.8, High/0.2}
The main idea underlying our approach is to describe
a learner model using the same representation used to describe LOs. This provides a straightforward mechanism to
recommend LOs to users on the basis of a compatibility degree. The common representation shared between learner
models and LOs is based on metadata describing specific
attributes. Unlike conventional metadata specifications, that
assume attribute values to be precise (crisp), we allow attribute values to be vague (fuzzy) by using a representation
based on fuzzy sets.
The theory of fuzzy sets [8] basically modifies the membership concept: a set is characterized by a membership
function that assigns to each object a grade of membership
ranging in the interval [0,1]. In this way, fuzzy sets allow
a partial membership of their elements and they are appropriate to describe vague and imprecise concepts. The use
of fuzzy sets together with particular mathematical operators defined on fuzzy sets provides a suitable framework for
handling imprecise information. Since LO’s (and learner’s)
attributes may be vague and imprecise, the employment of
fuzzy sets to define their values can be of valuable help,
leading to a description based on the so-called fuzzy metadata. Fuzzy metadata provide a general description of attributes related to a LO, characterized by both precise and
vague properties. In particular, using fuzzy metadata, we
which means that the attribute Complexity is defined by a
fuzzy set comprising three fuzzy values: Low that characterizes (belongs to) the attribute with membership degree
equal to 1.0; M edium with membership degree equal to
0.8 and High with membership degree equal to 0.2.
More formally, denoted by A the set of all possible attributes, we define a fuzzy metadata as a couple a, μ being a ∈ A an attribute and μ : Dom (a) −→ [0, 1] a fuzzy
set defined on Dom(a) (the set of all possible values of attribute a). Then, a learning resource LO is described by a
set of fuzzy metadata, i.e.
LO = {a, μ |a ∈ A}
with the constraint that each attribute occurs at most once
in the description:
∀ a , μ , a , μ ∈ LO : a = a → μ = μ
A very simple example of LO description is reported in
figure 1. It can be seen how fuzzy metadata extend classical
metadata since they can describe precise as well as imprecise properties characterizing the attributes of a LO. The
attribute “Title”, for example, has a crisp nature, hence it
is represented as a singleton fuzzy set “Java basis course”
Title, { Java basis course / 1.0 }
General topics, { Computer Science /1.0 }
Specific topics, { Programming / 0.8, Operating Systems / 0.2 }
Complexity, { LOW / 1.0, MEDIUM / 0.8, HIGH / 0.4 }
Fruition time, { Trapezoidal(15,30,60,90) }
Figure 2. An example of learner model with
two components
Figure 1. An example of LO description
with full membership degree. The attribute “Specific topics” is characterized by collective values, hence it is described by a fuzzy set enumerating two values: Programming, with membership degree 0.8, and Operating systems
with membership degree 0.2. This means that this LO deals
mainly on Programming, and, to a lesser extent, on Operating Systems. The attribute “Complexity” has a granular
nature, hence it can be defined by enumerating three values
(LOW, MEDIUM and HIGH) with different membership
degrees. Finally, the attribute “Fruition time” has an imprecise and continue nature, hence it is described by a fuzzy set
characterized by a trapezoidal membership function defined
on the domain of time (expressed in minutes).
C++. It also indicates that preferred LOs should be mainly
targeted to undergraduate students, while LOs targeted to
researchers and graduate students are not of main interest
for this learner.
2.3 Matching mechanism
Given a Learning-Object description LO defined as in
(1) and a learner model P defined as in (3), we define a
matching mechanism to compute a compatibility degree between LO and P that is as high as the learning resource is
deemed compatible with learner’s interests and preferences.
The overall compatibility degree K(LO, P ) of a learning resource LO to a learner model P is a value in [0, 1]
defined in terms of the compatibility between LO and each
component of P . Namely, we define:
2.2 Description of learner models
Learner models are used to represent the preferences
of each individual learner accessing the e-learning system. Precisely, a learner model reflects the preferences the
learner has for one or more attributes of the accessed LOs.
We define a learner model as a collection of components,
where each component represents an elementary preference
that is characterized in terms of fuzzy sets, likewise the
fuzzy metadata specification used for LO description. This
homogeneity enables a very direct matching between model
components and LOs, so as to derive a compatibility degree
useful for LO recommendation.
Formally, a learner model is defined as:
P = {p1 , p2 , . . .}
Topics, { Fuzzy Set Theory / 1.0, Neural Networks / 0.8 }
Genre, { theoretical / 1.0, applicative / 0.1, survey / 1.0 }
Topics, { C++ / 0.2, Java / 0.8, Smalltalk / 0.3 }
Target, { researcher / 0.1, undergraduate / 1.0, graduate / 0.5 }
K(LO, P ) = max K(LO, p)
We use the ‘max’ operator since we express the overall compatibility as disjunction of elementary compatibilities computed between the LO and the single model components.
The compatibility degree between a LO and a component p
is defined by matching the fuzzy metadata shared by the LO
and the component, that is:
K(LO, p) = AV G{K(μLO , μp )|
∃a ∈ A s.t. a, μLO ∈ LO ∧ a, μp ∈ p}
where each component pi is represented using a LO description, i.e.
pi = {a, μ |a ∈ A}
where AV G is the standard mean, which is used as a particular case of aggregation operator, and K(μLO , μp ) is the
compatibility degree computed between two fuzzy sets. To
evaluate the compatibility degree between two fuzzy sets,
we adopt the Possibility measure [7], that evaluates the
overlapping between fuzzy sets as follows:
A learner model is initially empty (i.e. it has zero components), then it incrementally grows by adding a component
or updating the existing components each time the learner
accesses a new LO. This dynamic adaptation of learner
models is described in section 3.
In fig. 2, an example of learner model with two components is reported. We may interpret this model as a learner
with two different types of interests. The first component
indicates that the learner is interested mainly on Fuzzy set
theory and, to a lesser extent, on Neural networks. The second component indicates that the same learner is mainly interested in Java and, to a minor extent, in Smalltalk and
K(μLO , μp ) =
{min(μLO (x), μp (x))}
The possibility measure evaluates the extent to which
there exists at least one common element between two fuzzy
sets. This measure is particularly suitable to quantify compatibility between fuzzy metadata, since we assume that two
metadata are compatible if they share at least one value of a
given attribute.
for each x ∈ Dom(a). The new fuzzy set μp∗ results a linear combination of its older version and the fuzzy set μLOt .
We can observe that if αt = 0 no update takes place; on
the other hand, if αt = 1 the previous definition of μp∗ is
replaced with μLOt . The parameter αt is tuned dynamically
so as to favor adaptation during earlier stages of the process.
We refer to this early phase as plastic phase. As t increases,
we make the adaptation less influencing so as to stabilize
the model components. We refer to this last phase as stable phase. To achieve this behavior, the parameter αt varies
according to the following law:
3 Adapting learner models
In order to derive a model that captures the preferences
of a learner during his/her interaction with the e-learning
system, we propose a strategy that dynamically creates and
updates learner models during time.
For each learner, the proposed adaptation strategy starts
with an empty model. During the adaptation process, the
model is dynamically updated as the considered learner accesses to the learning resources. The approach used for deriving and updating a learner model resembles to a competitive learning procedure [2], with some variations necessary for dealing with the various components of the learner
The adaptation process works as follows. Given a
learner, his/her model is initially defined as an empty set,
i.e. P ← ∅. Next, whenever a learning resource LOt is accessed by the learner at time t, the model is updated in the
following way. For each model component p ∈ P , the compatibility degree between LOt and p is computed and the
model component giving the maximum degree is selected,
that is:
p∗ = arg max K(LOt , p)
αt = exp(−α(t − 1))
where the value of α is set empirically so that αt is greater
than 0.5 for t < 0.1N , being N the estimated number of
total accesses a learner makes to the system. In other words,
according to the frequency of learner accesses to the LOs,
we estimate that the first 10% of time is used only to create
the learner model (LOs are not suggested during this initial
stage) while the remaining time is used to update the model
as well as to recommend LOs to the learner. This can be
achieved by setting:
If the compatibility degree K(LOt , p ) is low (i.e. it is less
than a fixed threshold δt 1 ), it means that there is no compatibility between the learning resource LOt and the existing
model components, hence a new model component is added
to P using the same metadata of LOt , i.e.:
10 log 2
N − 10
N − 10
4 Preliminary simulation results
The proposed approach for deriving adaptive learner
models was tested in a simulated environment. The simulation was aimed at verifying the ability of our approach in
creating several model components that correspond to distinct preferences of a learner.
We randomly generated 100 LOs with uniform distribution. We assumed that each LO was characterized by the
presence of five attributes, conventionally named a1 , a2 , a3 ,
a4 and a5 . Each attribute had a three-valued domain, i.e.
Dom(ai ) = {v1 , v2 , v3 }. To verify the ability to derive different model components from the same learner, we defined
an ideal model made up of three components, as follows:
P ← P ∪ {LOt }
Conversely, if the compatibility degree is high (i.e.
K(LOt , p∗ ) ≥ δt ), the model component p∗ is updated so
as to resemble to LOt . The update concerns all attributes
and it is performed according to the following rules. For
each a ∈ A, we denote as μap∗ the fuzzy set in meta
data a, μap∗ ∈ p∗ (if the attribute a is not used in the
model component, we consider the degenerate fuzzy set,
i.e. the fuzzy set such that μap∗ (x) = 0 for each x ∈
Dom(a)). Similarly, we define the fuzzy set μaLOt in meta
data a, μaLOt ∈ LOt . The fuzzy set μap∗ is updated as
a1 , {v1 /1}
a3 , {v3 /1}
a2 , {v2 /1}
μap∗ (x) ← (1 − αt )μap∗ (x) + αt μaLOt (x)
1 The value of δ serves to establish whether to create a new model
component or update an existing one. We observe that for δt = 0 no new
model component is created, independently on the value of compatibility
degrees between LOs and the existing model components. On the other
hand, for δt = 1, new model components are created for every distinct
LO accessed by the learner. In this work, we choose δt = 0.5 for t ≤
0.1N and δt = 0 for t > 0.1N , where N is the estimated number of
total accesses a learner makes to the systems. In this way, new model
components are generated only in the initial phase whenever incompatible
LOs are accessed by the learner.
a2 , {v2 /1}
a4 , {v3 /1}
a4 , {v3 /1}
A linguistic interpretation of the model might be the
preference for either one of the following types of LOs:
• LO with General Topic “Computer Science”
(a1 , {v1 /1}) and of “Theoretical” Genre
(a2 , {v2 /1});
• LO targeted to “researchers” (a3 , {v3 /1}) with Specific Topic on “Programming” (a4 , {v3 /1});
Number of tests
Average differences
in compatibility degrees
Model components
Figure 4. Distribution of the number of model
components in 100 tests
2. Select a LO according to the corresponding probability
Number of tests
3. Update the learner model as described in Section 3.
Eventually, a new model component is created if the
LO is incompatible with the existing model components.
These three steps were iterated N times. At the end of the
adaptation stage, for each learning resource LOj we compared the compatibility degree of LOj to the ideal model
with the compatibility degree of LOj to the actual model.
We expected that the two compatibility degrees would not
differ too much. The entire simulation was run 100 times
to gather statistically significant results. Fig. 3(a) shows the
average values of the differences between the compatibility
degrees of each LO with the ideal and the derived model.
In fig. 3(b), the distribution of such values is shown. As
it can be observed, about in 50% of trials differences between compatibility degrees were less than 0.15, and this
percentage increases to about 75% in correspondence to a
difference of 0.2. These results indicate a good performance
of the adaptation algorithm, in consideration of the random
pick of the learning resources (7) that prevents the derived
model to converge exactly to the ideal one.
Also, in fig. 4 we report the distribution of the number
of model components generated in 100 tests. It can be seen
that in the most frequent case (55 tests) three model components were generated, thus reflecting the structure of the
ideal model. In some cases (18 tests) the number of model
components was less than required. Obviously, in these
cases the matching performance was not fully satisfactory.
In the remaining cases, more than three model components
were derived. Anyway, this unnecessary redundancy did not
hampered the matching performance.
Average difference distribution
Figure 3. Average differences in LO compatibility degrees with ideal and derived model
(a) and their distribution (b) over 100 tests.
• LO with Specific Topic on “Programming”
(a4 , {v3 /1}) of “Theoretical” Genre (a2 , {v2 /1}).
In this simulation, we assumed that the learner may make a
total number of accesses N = 50. To simulate the learner
behavior, we generated three probability distributions for
the random peek of LOs. The following rule defines the first
probability distribution, which is related to the first ideal
model component:
(v1 ) · μaLO
(v2 )
P rob(LOj ) = 100 a1
h=1 μLOh (v1 ) · μLOh (v2 )
The remaining probability distributions were defined accordingly, using the second and the third component. The
simulation proceeded by carrying out the following steps:
1. Select randomly an integer number in {1, 2, 3} (uniform distribution) to select a probability distribution
for random peek of LOs;
5 Discussion
[8] L. Zadeh. Fuzzy sets. Information and Control, 8:338–353,
[9] L. Zadeh. Precisiated natural language (PNL). AI Magazine,
25(3):74–91, 2004.
[10] L. Zadeh. Is there a need for fuzzy logic? Information
Sciences, 178:2751–2779, 2008.
The last years were characterized by a strong interest
in adaptive e-learning systems able to suggest contents to
learners by adapting them to their preferences and needs.
In such systems, user modeling covers a key role for the
definition of models that represent in a significant manner
preferences of learners. Another important aspect that has
to be considered in the context of user modeling is the ability to dynamically update the derived models so as to adapt
them to the constant changes in the interests of the learners
when they choose among the learning resources to visit.
In particular, this paper proposed a fuzzy representation
of learner models and a strategy for dynamic updating of
such models based on procedure that resembles a competitive learning algorithm. The adaptation strategy updates
continuously models taking into account the resources that
each learner chooses during the interaction with the system.
This strategy is essentially characterized by two phases: an
initial phase (plastic phase) that is more sensitive to updates
and a second phase (stable phase) that is less sensitive to
adaptation, thus enabling the suggestion of LOs.
Results obtained by the simulations carried out have
shown that the adaptation algorithm converges to significant models including a number of components useful to
describe the changing interests of learners.
Future research will investigate on methods for refining
the adaptation procedure by taking into account several issues, such as merging similar model components or pruning
useless model components.
[1] N. Adler and S. Rae. Personalized learning environments:
The future of e-learning is learner-centric. E-learning, 3:22–
24, 2002.
[2] S. C. Ahalt, A. K. Krishnamurthy, P. Chen, and D. E.
Melton. Competitive learning algorithms for vector quantization. Neural Networks, 3(3):277–290, 1990.
[3] P. Brusilowsky. Adaptive and intelligent technologies for
web based education. Intelligent Systems and Teleteaching,
4:19–25, 1999.
[4] P. Dolog, N. Henze, W. Nejdl, and M. Sintek. Personalization in distributed elearning environments. In Proc. of
WWW2004 The Thirteen International World Wide Web
Conference, New York, USA, 2004.
[5] V. M. Garca-Barrios. Adaptive e-learning systems: Retrospection, opportunities and challenges. In Proc. of International Conference on Information Technology Interface (ITI
2006), pages 53–58, Cavtat, Croatia, June 12-22 2006.
[6] C. Jing and L. Quan. An adaptive personalized e-learning
model. In Proc. of 2008 IEEE International Symposium on
IT in Medicine and Education, pages 806–810, 2008.
[7] W. Pedrycz and F. Gomide. An Introduction to Fuzzy Sets.
Analysis and Design. MIT Press, Cambridge (MA), 1998.
Adaptive learning using SCORM compliant resources
Pierpaolo Di Bitonto, Teresa Roselli, Veronica
Lucia Monacis, Rino Finamore, Maria Sinatra
Department of Psychology
University of Bari
70125 Bari, Italy
[email protected],
[email protected]
Abstract — In recent years great efforts of e-learning research
have been focused on customising learning paths according to
user preferences. Starting from the consideration that individuals
learn best when information is presented in ways that are
congruent with their preferred cognitive styles, the authors built
an adaptive learning object using the standard SCORM, which
dynamically related different learning content to students’
cognitive styles. This was performed in order to organize an
experimental study aimed at evaluating the effectiveness of an
adaptive learning object and the effective congruence of this
adaptive learning object with the presentation modes and
cognitive styles.
The sample was made up of 170 students enrolled in two
different University courses. The data were gathered by a
Cognitive Styles Questionnaire to identify each student cognitive
profile, a Computer Attitude Scale to assess the computer-related
attitude, and Comprehension Tests. The results indicated that
there was a good flexibility of the adaptive learning object, and
that analytic and imaginer subjects showed more positive
computer attitudes related to a better comprehension of the
learning content.
Keywords-component: cognitive style, adaptive learning object,
SCORM standard
Nowadays we are experiencing a radical change in the
didactic and education system which is leading several schools,
universities, and companies to adopt state of the art Web based
technologies as a new means of managing and sharing
knowledge. Such a change is favoured by the numerous
advantages guaranteed by Distance Education. One of the most
notable and often mentioned benefits is flexibility in time and
space: the majority of programs allow students to learn when
and where it is more convenient for them, without the grind of
the traditional classroom setting. On the other hand, in Distance
Education the lack of the teacher’s continuous monitoring of
the student’s activities can cause distraction and frustration.
In the last thirty years, the Adaptive Hypermedia have been
the focus of Distance Education research. In [1] Brusilovsky
considers the problem of building adaptive hypermedia
systems and states that the student’s background, experience
and preferences should taken into account. As a consequence,
in recent years a great number of works have been carried out
Department of Informatics
University of Bari
70125 Bari, Italy
{dibitonto, roselli, rossano}
in the adaptive hypermedia and user modelling research [2, 3,
4, 5].
Moreover, as psychological investigations have revealed
that individuals learn best when information is presented in
ways that are congruent with their preferred cognitive styles
[6], the effort of research in the adaptive learning area has been
focused on the use of students’ cognitive and learning styles, as
reported in [7, 8, 9].
The authors’ research work was aimed at defining a
technique to design and build adaptive learning paths in elearning environments using the standard SCORM. In [10] a
first technique to adapt the learning content of a SCORM
package according to the learner cognitive styles was
presented. The Italian Cognitive Styles Questionnaire defined
by [11] De Beni, Moè, and Cornoldi was used to define how to
tailor the learning content to the students’ profiles.
The main issue for defining an effective tailoring technique,
is analysis of the relationship between cognitive styles and the
way of presenting learning material.
In this context, an experimental study was carried out to
assess both the effectiveness of an adaptive learning object
which relates different learning content to the students’
cognitive styles, and the congruence between the presentation
modes and cognitive styles.
Since the ’80s several studies have shown that the use of
Distance Education systems improve the performance of those
students who interact with these environments compared to
those who interact in a traditional classroom [12, 13, 14].
However, since the ’90s many researchers have been
consistently asking how the structure and the learning material
interact with students’ cognitive styles. Previous investigations
focused generally on the physical organisation and external
appearance of the learning material, i.e. the physical layout,
such as the size of the viewing window, the inclusion of
headings, etc. [15]. Other studies [16, 17] stated that the
manner of presentation as represented by verbal, pictorial or
auditory modes affected learning performance according to
cognitive style.
As far as the concept of cognitive style is concerned, it
should be noted that it refers to the specific way that an
individual being codes, organizes, and performs with
information, leading to a cognitive management of learning
strategies [18]. Consequently, there are several different
cognitive styles.
metadata in order to facilitate their search and reuse. The SCOs
are, in fact, the smallest unit that can be lunched and traced by
the LMS. The next level is the aggregation, which is not a
physical file but just a representation of the organisation of a
SCORM package. The aggregation represents the rules of
sequencing used to aggregate the different SCOs and/or assets.
The SCORM package may, therefore, consist of one or many
SCOs and assets.
In 1991 Riding [19] suggested that all cognitive styles
could be categorised according to two orthogonal dimensions:
the wholist-analytic dimension and the verbaliser-imager one.
The former dimension can be considered as the tendency to
process information either as an integrated whole or in discrete
parts of that whole. Thus, wholists are able to view learning
content as complete wholes, but they are unable to separate
them into discrete parts; on the contrary, analytics are able to
apprehend learning content in parts, but they are unable to
integrate such content into complete wholes.
The latter dimension can be considered as the tendency to
process information either in words or in images. Verbalisers
are better at working with verbal information [20], whereas
imagers are better at working with visual and spatial
information, i.e., with text-plus picture.
Starting from these introductory statements, our research
work aims at defining a technique to build an adaptive learning
object standard SCORM which can be tailored on the basis of
the learner cognitive styles. The cognitive styles were classified
according to the Italian Cognitive Styles Questionnaire that
provides the different users’ profiles divided into wholists,
analytics, verbalisers, and imagers. The Questionnaire details
are presented in section V.
The standard SCORM (Sharable Content Management
Metadata) is one of the most widespread standards used for
building LO because it allows the interoperability between the
content (LO) and the container (LMS).The standard thus offers
the possibility of defining didactic content that can be easily
adapted to the users-LMS interaction. In order to understand
how user adaptation can be possible, some details on the
standard SCORM should be given.
The SCORM consists of: the Content Aggregation Model
(CAM), which describes how the SCORM package should be
built; the Run Time Environment (RTE), which simulates the
LMS behaviour; the Sequencing and Navigation (SN), which
describes how each LO component should be aggregated in
order to offer different learning paths to the users.
The CAM specification describes the components used in a
learning experience, how to package and describe those
components and, finally, how to define sequencing rules for the
components. Figure 1 depicts the organisation of a learning
content in a SCORM package. The learning content is made up
of assets, which are the smallest part of an LO, such as a web
page, a text or an image. The assets are, then, aggregated in
Sharable Content Objects (SCO), which have to be tagged with
Figure 1. SCORM package organisation [ADL]
At this point the Sequencing and Navigation specifications
are used to define the tree structure and sequencing behaviour
used to navigate among the different components of the
SCORM building different learning paths. Using the SN it is
possible, during user interaction, to dynamically choose which
SCO has to be launched by the LMS. This allows the LMS to
build different customised learning paths in the same SCORM
The real context chosen for the experimental study is the
course of Psychology of Communication for undergraduate
students belonging to different degree courses: Informatics and
Digital Communication, and Humanities. The use of the same
content in different learning contexts and with different
learners (with different backgrounds and different learning
approaches) allowed the authors to assess whether learning
content customisation, on the basis of the learner preferences,
could be successful at any time.
The chosen content dealt with three different topics
concerning communication: structure, the various functions
and the persuasive models of communication.
Each topic was divided into two didactic units: the first one
represented the learning content, described using different
presentation modes according to the users’ cognitive styles; the
second one represented the reinforcement of the same learning
content. Moreover, each didactic unit was followed by a
multiple-choice test (Comprehensive Test). The overall number
of tests was 24. The navigation among the different units will
be explained in section VI.
Defining the rules to be implemented in the SN of the
SCORM package required the definition of the learner
cognitive styles obtained by the submitting the Cognitive
Styles Questionnaire developed in 2003 by De Beni, Moè, and
Cornoldi [11].
It consists of two parts and nine items with a 5 points Likert
scale for each style. To assess either the wholistic style or the
analytical one, students have to observe a figure for thirty
seconds and reproduce it. The figure, a sort of Rey’s test (1966)
revised experimentally by Cornoldi, De Beni, and the MT
Group [21], included both a global configuration and some
elements regarding a missile, a big pencil, a little flag, single
shapes, etc. Nine items were provided to assess the students’
preference towards the wholistic style (5 items) or the analytic
one (4 items). All items concerned both the analysis of figure
(3 items) and various situations (6 items). As regards the
students’ preference towards verbaliser or imager styles, twelve
words and twelve images are proposed. Students have to
answer nine items: four items concerned the verbaliser style
and five items concerned the imager one. All items referred to
the required task consisting of writing the learning material.
The questionnaire had to be completed within 25 minutes.
In order to define the user cognitive style a score has to be
calculated according to the rules provided in the questionnaire.
In both cases the score can vary from 9 to 45. The higher the
score obtained, the higher the subject’s preference for the
wholistic style, in the first case, and for verbaliser style in the
second one. Therefore, the questionnaire result for each student
is an ordered list of cognitive styles. This ordered list allows
the software agent to choose the most appropriate presentation
mode for each student. This information is recorded in the
learner profile used by the SN rules to select the SCO to be
traceable by a LMS, contains a didactic unit for a cognitive
style (i.e. the Persuasive Models of Communication for
imaginer learners) and the Comprehensive Test (CT) useful for
verifying the learner information acquisition. The overall
number of the SCOs is 24, since we had three topics, for each
of them two didactic units represented using four presentation
The SCOs, then, are organised using a tree aggregation
form that represents the logical organisation of the learning
content given by the domain expert and described in section
IV. The Sequencing and Navigation rules are used to explore
the tree choosing the right SCO to be launched according to the
user’s interaction. Figure 2 depicts part of the LO navigation.
Each single box represents a SCO, which contains a specific
domain concept for a cognitive style (i.e. Pervasive
Communication for imaginer learners) and the Comprehensive
Test. The arrows show the navigation flow among the SCOs:
the ones represented using the straight line indicate that the
learner passes the Comprehensive Test, otherwise the dotted
line is used. If the learner passes the CT, the new SCO,
launched by the LMS, will contain the next learning content
using the same cognitive style or presentation mode (i.e.
verbaliser in the figure). In the event of a learner responding
incorrectly, a reinforcement (using the same cognitive style)
will be presented, and, finally, a new CT is presented to the
learner. If the learner passes this CT, s/he can go on in the
learning path using the same cognitive style. If the learner fails
the CT twice, we assume that s/he needs to study the content
using a different presentation mode. Thus, the LMS launches
the SCO that contains the same learning content depicted using
a different cognitive style according to the leaner information
In order to build an interoperable LO that could be easily
integrated into any e-learning environment, the SCORM
standard was chosen. In designing a SCORM package, the first
issue to consider is the granularity of each individual SCO.
Since the first definition of LO was given [22], it is well known
that the most difficult problem is the definition of the optimal
size of an LO for it to be sharable, reusable and effective. If the
LO has a low level of granularity, for example a chemistry
course, it would be difficult to reuse without changes in other
contexts, such as in an Engineering curriculum. On the other
hand if the LO has a high level of granularity, for example an
animation of a chemical reaction, it could be reused in many
contexts, in different ways and with different learning goals,
such as in a lesson aiming at showing the atom metaphor for
LO. But, if the LOs are small and have a high level of
granularity, it will be impossible for a computer agent to
combine them without the intervention of a human
instructional designer. This problem, called reusability
paradox, has been formalised by Wiley [23]: if a learning
object is useful in a particular context, by definition it cannot
be reused in a different context, on the other hand, if a learning
object can be reused in many contexts, it is not particularly
useful in any.
Figure 2. The adaptive learning object organisation
In our context, in order to obtain a high level of
personalisation of the learning content a high level of
granularity was chosen. Therefore, each SCO, the smallest unit
It is important to investigate the relationship between
Cognitive Styles, Computer Attitudes, and the manner of
presenting learning material in order to assess the effective
adaptativity of the Learning Object.
A. Procedure
The general design of our study involved a comparison
between students’ computer attitudes, their own cognitive
style, and the specific learning material. To this purpose, we
used assessment tests, preference scales, and the adaptive
learning object previously described.
B. Method
1) Participants
A sample of 173 undergraduate students, from both degree
courses, was employed for this study. Seven of them were not
recorded. The mean age was 20.45 with an SD of 2.03.
2) Instruments
In order to assess subject cognitive styles the questionnaire
described in section V was used. The computer attitudes, on the
other hand, were assessed using the Computer Attitudes Test
(CAS) developed by Al-Khaldi and Al-Jabri [24], reviewed by
Shu-Sheng Liaw [25], and translated into Italian, in view of the
lack of this kind of scale. Subjects are asked to indicate their
perceptions toward computer literacy , liking, usefulness, and
intention to use and learn computers. These questionnaires are
all seven-point Likert scales (from ‘‘strongly disagree’’ to
‘‘strongly agree’’), assessing three components of attitudes
towards computers, i.e. affective, behavioral and cognitive. The
total number of the CAS items is 16, which are divided into 8
items for affective score (1-8 item), 4 items for cognitive score
(9,10,15,16) and 4 items for behavioral score (11,12,13,14).
The theoretical maximum and minimum possible scores on this
scale are 16 and 112 respectively. The attitude towards the
computer is assessed according to the following score obtained
by students scale:
16-48 low attitude
49-80 average attitude
81-112 high attitude
Moreover, the total score is assessed by the scoring of each
subscale, which also determines any predominance amongst
them. The “affective” subscale score goes from 1 to 56 and it is
divided in the following way:
1-18 low
19-37 average
38-56 high
The “behavioral” and “cognitive” subscale scores go from 1
to 28 and both are divided in the following way:
verbalisers and imaginers, and in relation to their results
obtained from the first Comprehensive Test.
Cognitive Styles
Data showed that the best scores in the first Comprehensive
Test were obtained by analytics and imaginers, 75.00% and
71.11% respectively; on the other hand, the worst ones were
verbalisers with 61.11% and wholists with 56.10%. In order to
analyze a possible association between 69 wrongers,
(amounting to 41.57%) and Computer Attitudes scores, the
CAS instrument was used and the data are illustrated in the
following table.
Cognitive Styles
Scores- high
Data indicated a significant effect of Computer Attitude
scores on cognitive style. Specifically, verbalisers and wholists
obtained a higher score in low attitude, whereas analitycs and
imaginers obtained higher score in average-high attitude.
This led to affirm that analytic and imaginer subjects
demonstrated more positive attitudes towards the computer
than verbaliser and wholistic ones.
2nd STEP
At this point 69 subjects were asked to do the second CT, in
order to obtain the academic success. Results are presented in
the table below.
Cognitive Styles
1-9 low
10-19 average
20-28 high.
C. Results
The data derived from the Cognitive Styles Questionnaire
revealed that subjects were labeled as analytics, wholists,
Given the high percentage of mistakes, i.e. 53,62%, this
result was analyzed with the Computer Attitude Scale in order
to confirm the relationship between cognitive style and
Computer Attitude.
Scores- Low
Data in table IV show the presence of the significant effect
of Computer Attitude and Cognitive Style. Verbaliser and
wholist subjects obtained the lowest score in the High
Computer Attitude, confirming less positive attitudes towards
computer; whereas analytics and imaginers presented the
lowest score in the Low Computer Scale.
Moreover, given the high level of wrong responses of
wholists and verbalisers to the 2nd Comprehensive Test, the
SCORM guided subjects to their second preferred cognitive
style and it started up the third step of object Learning Task.
3nd STEP
Data indicated the new percentage of sample distributed
according to the switching of the cognitive style.
Analytics Verbalisers
After the 1st Comprehensive test, 78.38% of responses were
found to be correct.
Cognitive Styles
Scores- Low
Scores- High
From the scoring it emerged that nobody obtained any
score in the High Computer Attitude and wholists scored
nothing in CSA. Furthermore, the verbalisers were found to be
the most negative towards computers. In the Average
Computer attitude, analytics were more positive than imaginers
In the first CT, subjects with correct responses were
58.43% in relation to their favourite cognitive style. The best
results were obtained by subjects with a positive Computer
Attitude: more than half of the subjects 31.93% showed a
higher attitude towards the computer. Amongst them, analytics
were found to be the most positive in object learning, followed
by imaginers, wholists and verbalisers.
Moreover, analytics confirmed their positive attitude in
behavioral and affective subscales during the two
Comprehensive Tests: in the first CT 35% of analytics were
behavioral, whereas in the second CT 50% were found to be
affective; amongst imaginers 40% of them were “affective”
and 60% “behavioral”, respectively in the first and the second
Furthermore, in the switching cognitive style data
confirmed that those with correct CT and more positive attitude
were analytics with 34.48%, followed by imaginers with
31.03% and wholists and verbalisers, both with 17.24%.
The pilot study has demonstrated one of the main
advantages of Computer Supported Learning, i.e. the
customisation of learning paths according to students’
cognitive style, in order to obtain academic success. From a
purely IT point of view, this paper has presented an example of
an adaptive LO built using the standard SCORM. The
customisation of the learning path in the LO has been first
defined by learner favourite cognitive styles resulting from the
Cognitive Style Questionnaire developed by De Beni, Moè,
and Cornoldi and then analysed from Computer Attitude Scale,
in order to explain the main reasons for unsuccessful learning.
Future investigations will involve a sample of students of
different educational backgrounds. Moreover, the results
obtained will allow to define rules in the expert system
presented in [10] to adapt any LO SCORM compliant.
On the other hand, the presence of some wrong responses,
i.e. 21.62%, led to comparison with the CAS.
P. Brusilovsky, “Methods and techniques in adaptive hypermedia,” User
Modelling and User-Adapted Interaction, 6(2-3):87 129, 1996.
K. VanLehn, "The behaviour of tutoring systems,” International Journal
of Artificial Intelligence in Education, 16 (3), pp. 267-270, 2006.
Michalis Xenos, “Prediction and assessment of student behaviour in
open and distance education in computers using Bayesian networks,”
Computers & Education, 43, 345–359, 2004.
T. I. Wang, K. H. Tsai, M. C. Lee, and T. K. Chiu, “Personalized
Learning Objects Recommendation based on the Semantic-Aware
Discovery and the Learner Preference Pattern,” Educational Technology
& Society, 10 (3), 84-105, 2007.
C. Bravo, W. R. van Joolingen, and T. de Jong, “Using Co-Lab to build
System Dynamics models: Students' actions and on-line tutorial advice,”
Computers & Education, In Press, Available online 19 March 2009.
R. E. Riding, M. Grimley, “Cognitive style and learning from
multimedia materials in 11-year children,” Br. J. Ed. Tech., vol. 30,
January, pp. 43-59, 1999.
A. Calcaterra, A. Antonietti, and J. Underwood, “Cognitive style,
hypermedia navigation and learning,” Computers & Education, 44 441–
457, 2005.
Huey-Wen Chou, Influences of cognitive style and training method on
training effectiveness, Computers & Education, 37, 11–25, 2001.
P. Notargiacomo Mustaro, I. Frango Silveira, “Learning Objects:
Adaptive Retrieval through Learning Styles,” Interdisciplinary Journal
of Knowledge and Learning Objects, vol. 2, 35-46, 2006.
P. Di Bitonto, T. Roselli, and V. Rossano, “A rules-based system to
achieve tailoring of SCORM standard LOs,” International Workshop
on Distance Education Technologies (DET 2008), part of the 14th
International Conference on Distributed Multimedia Systems, Boston,
USA, 4-6 September, 2008.
R. De Beni, A. Moè, and C. Cornoldi, AMOS. Abilità e motivazione allo
studio: valutazione e orientamento. Questionario sugli stili cognitivi.
Erickson: Trento, 2003.
T. R. H. Cutmore, T. J. Hine, K. J. Maberly, N. M. Langford, and G.
Hawgood, “Cognitive and Gender Factors Influencing Navigation in
Virtual Environment,” Int. J. Hum. Comp. St., vol. 53, pp. 223-249,
P.-L. P. Rau, Y.-Y. Choong, and G. Salvendy, “A Cross Cultural Study
on Knowledge Representation and Structure in Human Computer
Interfaces,” Int. J. Ind. Erg., pp. 117-129, 2004.
[14] M. Workman, “Performance and Perceived Affectiveness in ComputerBased and Computer-Aided Education: Do Cognitive Styles Make A
Difference?,” Comp. Hum. Behav., vol. 20, pp. 517-534, 2004.
[15] G. Douglas, and R. J. Riding, “The Effect of Pupil Cognitive Style and
Position of Prose Passage Title on Recall,” Ed. Psych., vol. 13, pp. 385393, 1993.
[16] R. J. Riding, and I. Ashmore, “Verbaliser-imager learning style and
children’s recall of information presented in pictorial versus written
form,” Ed. Psych., vol. 6, pp. 141-145, 1980;
[17] R. J. Riding, and D. Mathias, “Cognitive Styles and preferred learning
mode, reading Webster attainament and cognitive ability in 11 year-oldchildren,” Ed. Psych., vol. 11, pp. 383-393, 1991.
[18] R. E. Riding, and I. Cheema, “Cognitive Styles: An Overview and
Integration,” Ed. Psych., vol. 11, pp. 193-215, 1991.
[19] R. E. Riding, Cognitive style analysis. Birmingham: Learning and
Training Technology, 1991.
[20] R. J. Riding, D. Mathias, op. cit. ; R. J. Riding, M. Watts, “The effect of
cognitive style on the preferred format of instructional material,” Ed.
Psych., vol. 17, pp. 179-183, 1997.
[21] C. Cornoldi, R. De Beni, and Gruppo MT, Imparare a studiare 2.
Erickson: Trento, 2001, p. 209.
[22] D. A. Wiley, J. B. South, J. Bassett, L. M. Nelson, L.L. Seawright, T.
Peterson, and D.W. Monson, Three common properties of efficient
online instructional support systems. The ALN Magazine, 3 (2),, 1999.
[23] D. A. Wiley, “Connecting learning objects to instructional design
theory: A definition, a metaphor, and a taxonomy,” in The Instructional
use of Learning Objects, D. A. Wiley, joint publication of the Agency
for Instructional Technology and the Association for Educational
[24] Shu-Sheng Liaw, “An Internet survey for perceptions of computers and
the World Wide Web: relationship, prediction, and difference,” Com.
Hum. Beh. 18,17–35, 2002.
[25] M. A. Al-Khaldi, I. M. Al-Jabri, “The relationship of attitudes to
computer utilization: new evidence from a developing nation,” Com.
Hum. Beh., 14 (1), 23–42, 1998.
# D
AEC # #
. + #
# # D . . # # # . "' # ? # # ) # # # #
#'> # H
. * # +*
# ( !& # %
# 0
!0* +* # ( &
"#' ! 0
& # AFC
# +*
, %# #
* # * #.
# # $ % & % % '(
, $+
& )
- (%
+.- %
/ 0$ %
3 % 4 % & % +
% +
% *
+. &% % (% % % 0 +
+ %
+% +
4/ + % 5
+ & % %
+ (
& 6+ ++ 0 % 70
+ % &
% & % + $+ 70
8 % % %
9 0 %
!" #
#'& ( # #
) = > & ?
@ & %
* # ( > %
## %# +*
* # . ABC
" # %
& #
# K#
# !
& . LK%. K !###
# # K# 0
# )
( # # 0
*# # =
. ( !J # $ %
# !
#J& # #%
% %
# # #
!KK& K#
# ?
!# G)= & /
# #
) D KK K
43 #
# # *
### =
. .# * (
AEC I #.1
. . ## # # "
2" ? . M=NHB
0# E KK # # # K
. # 0
# "
#' # >
# * # !"
*'& / /
" 3 .
# & # ? D & # *
# # #
K# 229
MNK+ ## #
# G)= G)= !
# ( # !#
# # * #& KK+% > ? D
LK ( * * % 2 > #
( ! & ( 0
# (
!0* +* # (
& %
# #
K=O 0
# 0
(? 6 # * ? ? D # APC 0
* +*
, # # #0
# #
! & # ( . 0
# ( !0#N&>
% # > D #
D 0
# ( 0LK
# ( # ! ( (
" 3 *#
* 0# H # ( # *
# % #%
# K=
# # (
%# > ## * # 230
. #
# +*
,> !G +#& # # > H G)=
* =
. !
* G.
& # # LK
. LK "=+'
0# B # # ## "=+'
")+' !
. D
& . * G)= H # . +*
, )+ LK
G6M+=0 ++L)+,=+I+
# . *
# %#
( !& # "=+' ")+' "<+L' # . *
* & "= +K' "<+0+'
# # #
!$ !
$ "#"#
'$ "#"#
'$ "#"#
'$ (
"=+' #)>
"> #).<' =+ .> #
# =* ##
")+' # 0#F
0# R # # ## "= +K'
= +K
D> 6
# I 6 +# 0
. #
# * # #
0# F # # ## ")+'
"<+L' #/
, # * =
# >
G)= *
! # Q EQQ& !0#P&
0# S # # #
0# P # # ## "<+L'
$ 9#$8$
# . # (> * 2%$3
1 #. "= +K' # # # 0# R # !* & #
"<+0+' #
# ?
D # #>
K ! & LK # 0
# :<1+G+K+<=
K # J J # .
# 1 . )
AEC K0 # 0 )
H 1
0# T # # ## "<+0+'
I <=<=
ANC )6
# # #
# # 0
## !)& # #
( # ) AHC )6
%*# # ). )# K
# G 492 =>;? NQQR HFR%
ABC )6
UE> %% 6 0
Enhancing online learning through Instructional Design:
a model for the development of ID-based authoring tools
Giovanni Adorni , Serena Alvino , Mauro Coccoli
Department of Computer, Communication, and Systems Science, University of Genoa
Viale Causa, 13 - 16145 Genova, Italy
{adorni, mauro.coccoli}
Institute for Educational Technologies – Italian National Research Council
Via De Marini, 6 - 16149 Genova, Italy
[email protected]
learning processes. Both the increasing success of
online learning and CSCL (Computer Supported
Collaborative Learning) initiatives [2] and the wide
spreading Web 2.0 [3] technologies and social
networking tools point out how the "active" and
"collaborative" learning is a fundamental paradigm in
the current knowledge society. Secondly, the EC
definition emphasize the quality of learning, that can
be achieved by identifying, sharing and adopting
methodologies and best practices for both individual
and collaborative learning; to this end, specific
Instructional Design (ID) models and strategies have
been developed in the last years to support the design
of effective e-learning initiatives. The ID is "a
construct referring to the principles and procedures by
which instructional materials, lessons and whole
systems can be developed in a consistent and reliable
fashion" [4]). ID principles and procedures are
normally rendered explicit through design models
(DMs) which are a kind of abstract design rules for a
given educational theory or didactic strategy that tells
how to organize appropriate materials, lessons or
learning scenarios to achieve specific learning
objectives. Recent approaches to ID point out that the
design process, as it is really put into practice by expert
designers, is not a procedure but a problem solving
process, guided by heuristics and best practices held as
effective for a specific problematic situation [5].
According to this perspective, as demonstrated by a
number of studies [6, 7], the alternative of rendering
explicit and formalizing heuristics and best practices
through DMs for the design and the management of
learning resources and activities become more and
more relevant in the educational research field. This
prospect has become especially significant for the field
of CSCL where best practices on how to structure
effective individual or collaborative learning process
are till now hardly shared by experts [2]. In addition,
current trends in the e-learning field [8] are also
showing the benefits coming from the investment in
In this paper, a novel point of view in the online
learning is provided by the integration of Instructional
Design (ID) principles and procedures within the field
of Educational Technology. In fact, actual educational
technologies and tools do not adequately support
teachers when creating, searching for and reusing
Learning Objects (LOs); authoring processes are
rarely personalized and pedagogical and contextual
information is often left aside as well as the
implementation of collaborative learning activities. So
ID principles and procedures, which normally foster
teachers to take the most adequate design choices, can
provide a useful support if embedded in the interface of
online learning authoring systems and tools. In this
respect, Design Models can guide the creation of
different types of LOs as well as lesson plans and
activities referring to them, through a number of
templates and representations. Also, LOs must have a
detailed description with pedagogical annotations, in
addition to standard metadata, and they should be
categorized on the basis of their format so that design
of personalized learning paths can be done. According
to these premises, this contribution presents a model to
develop a new generation of software systems and
tools, embedding innovative ID methodologies.
1. Introduction
A number of different definitions and conceptions
of "e-learning" can be found in literature. The
European Commission defines the e-learning concept
as "the use of new multimedia technologies and the
Internet to improve the quality of learning by
facilitating access to resources and services as well as
remote exchanges and collaboration" [1]. On the one
side, this definition emphasizes the important role of
ICTs and educational technologies, as the way to
support the "social dimension" in formal and informal
the creation, sharing and reuse of Learning Objects
(LOs), defined by Wiley [9] as "any digital resource
that can be reused to support learning"; this wide
accepted definition refers to both standard-based LOs
(e.g. SCORM [10]) and LOs supporting collaborative
learning [8]. So, at present, the "community" is playing
more and more a key-role in the e-learning field both
when involved in formal or informal collaborative
learning and when involved in the sharing of best
practices, through the formalization and the reuse of
LOs [11] and DMs [12, 13]. In fact, designers and
teachers can create resources and share them within a
professional community; these resources can support
both individual and collaborative learning and can also
be compliant with some international standard so to be
interoperable and automatically interpretable by LMS.
But how many teachers are able to design an effective
LO and describe it adequately so to foster an easy
retrieval within a repository? How many of them are
able to integrate these resources in active and/or
collaborative learning activities? Recent studies [14, 5]
have pointed out that e-learning practitioners,
especially when unskilled, need to be supported both
when creating and describing a LO and when
designing a collaborative learning activity.
In this perspective, in line with recent research
studies about ID and Leaning Design [12, 13], this
contribution aims to present a model to develop a new
generation of software systems and tools, which embed
innovative ID methodologies; these tools would be
able to support unskilled teachers and designers when
creating learning materials or designing activities,
lessons and courses, so to effectively structure the
contents and the activities according to specific
heuristics and good practices.
IMS (the Global Learning Consortium), OKI (the Open
Knowledge Initiative), and ELF (the E-Learning
Framework) [15], put in evidence that different online
learning applications may need different characteristics
from both a technical and functional point of view:
school, universities, industry, corporate, life-long
learning have very different requirements. As a matter
of fact, there is the convergence of a number of
different users' needs in just one system, performing
multiple functions and managing different users' roles.
Also there is the convergence of many theoretical
models and many possible technical solutions. Yet the
technological support is not so flexible. The Learning
Management System (LMS) has been the main actor of
Internet-based education for the past two decades and
the main delivery systems for standard compliant LOs.
However, the traditional conception of LMS is falling
to keep pace with recent advances in education,
information and communication technology, and the
semantic web [16]. There is much more; thus, a
modular architecture is needed for LMSs which can
interact with a wide variety of services and tools that
may be needed, and even may be different from case to
case, for achieving the best results in learning and
teaching [17]. In this respect, many researchers have
already investigated on how to bridge ID and learning
content [18]. In such a context, it is clear that a new
generation of software tools designed to simplify the
work of the users within their design, teaching and
learning activity is needed [19] so that online learning
can be significantly improved.
2.1. Embedding design models into online learning
authoring tools
Teachers are often unskilled in creating or retrieving
educational resources which fit in with the needs of
their educational context and often lack competencies
on how to share them to foster reusability within a
community [14, 20]. In addition sharing educational
resources is not a straightforward task for teachers, but
requires them a good amount of work both to integrate
in their own lessons other people’s productions and to
prepare new contributions in easily re-usable and
adaptable form; as a consequence, LOs technology
struggles to gain momentum and acceptance in the
communities of teachers and instructional designers
[6]. ID heuristics and practices can provide a
fundamental support to teachers for: a) identifying the
main constraints characterizing the specific educational
context; b) designing effective LOs and learning
activities taking into account those constraints; c)
searching for reusable resources which can be
effectively integrated in a specific learning path. These
heuristics are especially important in the context of
2. ID models and pedagogical metadata: new
challenges for online learning authoring tools
Practitioners and researchers, having different
educational and technological backgrounds, hardly
share a common view on how to support an effective
learning process by means of technologies and distance
education good practices. The actual added value in the
design of applications supporting online learning is the
integration of different points of view, raised up by
both designers and end-users, and including both
educational and technological perspectives. Given the
actual situation and the existing platforms and tools,
one can observe that many actors are involved in the
teaching and learning process and that different
objectives are to be considered, depending on specific
points of view. Sometimes, some of the objectives are
opposite to other ones. International initiatives such as
CSCL where good practices about how to structure
computer-mediated interactions are till now hardly
shared by experts [2].
Traditional ID methods and online learning ones
can be mould and structured into design models
(DMs), i.e. schemata, scripts, meta-models, embedding
a specific pedagogical approach, that support teachers
in developing educational proposals; these resources
can be reused in different educational contexts [6]. In
particular, the bridging between collaborative learning
and traditional ID methods [12] by means of CSCL
scripts, and specially macro-scripts, has recently raised
a lot of attention. CSCL macro-scripts are models that
formalize and represent a sequence of activities aimed
at fostering a meaningful learning process in a group
[ibid.]. They can be reused and instantiated (adapted,
contextualized) in different educational contexts, being
formalized at different levels of abstraction; the more
abstract level is independent from the content and,
generally, it represents the solution to a recurrent
educational problem (e.g. Pedagogical Design
Patterns); other macro-scripts represent a particular
instantiation of the general educational problem,
suggesting contents, roles, tools, services, etc., needed
to support the activity (e.g. lesson plans and IMS-LD
Units of Learning [13]). CSCL macro-scripts can be
representations, shared and reused, exactly as designers
and teachers usually do with LOs.
An innovative approach for the development of a
new generation of authoring tools fostering the design
of online learning is to integrate DMs in the system
interface, in order to support unskilled teachers in the
design phase of LOs, activities and modules. Currently,
the new research lines focused on the formalization of
macro-scripts are systematically translated into practice
only by initiatives which implement Learning Designbased [13] authoring tools and platforms (such as
RELOAD -, ReCourse,
LAMS or COLLAGE Unfortunately, Learning
Design theories [21], which propose to represent the
learning process by means of formal languages (EML Educational Modeling Languages), have shown their
limits; in fact, although different research lines are
currently engaged in identifying methodologies and
tools for bringing the Learning Design closer to
designers' and teachers' daily practice, technical
specifications such as IMS-LD [13] are not so widespread in the e-learning field, yet. This is due, from the
one side, to their complexity and, from the other side,
to the limits embedded in their semantic, which doesn't
allow the direct representation of groups and their
structuring in collaborative activities [22]. On the
contrary, other initiatives demonstrated that macroscripts, when embedded in the interface of design tools
(see e.g. COLLAGE [ibid.]), could provide an effective
support in the design process [6]. Finally, current
trends are pointing out the effective role of diagrambased graphical representation of ID best practices and
macro-scripts when embedded in the interface of
learning design authoring tools [7]. In addition, some
advances have been done in the perspective of
modeling the design process of LOs. In the last years
some research initiatives tried to define taxonomies of
LOs, according to their main technical characteristics
and to their semantic dimension [9, 11]. So, different
approaches to the design process of LOs have been
proposed in literature, some of them integrated in
specific authoring tools (e.g. RELOAD). But, from an
educational point of view, like any other instructional
technology, LOs must embed specific ID strategies [9].
So, new approaches overcome the limitations
introduced by the main technical specifications [10]:
some of these initiatives [23], are now trying to
classify LOs according their educational features, such
as the embedded didactic strategy. In this perspective it
could be feasible to model the structure of different
LOs typologies according to their pedagogical
approach and to model DMs' structure and flow
through text and diagrams. Guidelines for supporting
the creation of LOs and for instantiating DMs in a
specific context can also be defined. All the ID models,
best practices and heuristics involved in the design of
this new generation of authoring tools for online
learning should be framed in an Instructional Design
Reference Model: this model, which constitutes one of
the peculiarities of this innovative approach, will guide
the design, the development and the integration of the
software application by defining a methodological and
pedagogical framework for: a) the definition of the
main design steps; b) the modeling of the main LO
typologies and of a set of reference didactic strategies
and DMs; c) the definition of guidelines for LO
creation and DMs instantiation.
2.2. Fostering effective reuse through the
pedagogical annotation of LOs and design models
Another crucial issue for teachers and designers
who want to search for and share educational resources
is the identification of proper metadata models or
specifications allowing for an effective description and
an easy retrieval. Such descriptors should enable users
in seeking resources not only on the basis of technical
and bibliographic information, but also on the basis of
their contextual and educational features. As a matter
of fact, the description of the educational needs that
inspired the design of a LO, of the underlying
assumptions on learning and of the epistemological
and pedagogical approaches to the content significantly
supports the retrieval of potentially re-usable products
and fosters the reflection on their adaptability to the
specific context [6, 24]. Such pedagogical metadata
sets, together with a user-friendly interface for LOs
annotation and retrieval, could support users’
motivation to invest their time and efforts in the
design, implementation and diffusion of reusable LOs.
A number of metadata specifications has been
proposed by various international initiatives (such as
LOM [11], EdNA -, TLF,
GEM,). But, the expressive power of
these metadata sets is often unsatisfactory with respect
to the underlying educational paradigm. In addition, as
we pointed out before, teachers, in their practice,
usually take advantage not only of learning material
directed to students, but also of DMs that represent
suggestions, work plans, best practices, etc., developed
by their peers. So teachers can be enhanced by
repositories that support the description and the
retrieval also of this kind of resources. Some
international proposals have been presented to improve
this situation, such as the POEM model (Pedagogy
Oriented Educational Metadata model) [6]; by means
of pedagogical vocabularies, validated by different
typologies of end-users, this innovative LOM [11]
application profile helps designers and teachers to
efficiently search for both LOs and DMs.
The challenge of new authoring-tools supporting the
creation of both LOs and DMs, such as macro-scripts,