Visual Re-Ranking for Multi-Aspect Information Retrieval Khalil Klouche1,3 , Tuukka Ruotsalo2 , Luana Micallef2 Salvatore Andolina2 , Giulio Jacucci1,2 1 Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, PO Box 68, 00014 University of Helsinki, Finland 2 Helsinki Institute for Information Technology HIIT, Aalto University, PO Box 15600, 00076 Aalto, Finland 3 Aalto University, School of Arts, Design and Architecture, Media Lab Helsinki, Hämeentie 135 c, 00560 Helsinki, Finland 1 [email protected], 2 [email protected] ABSTRACT We present visual re-ranking, an interactive visualization technique for multi-aspect information retrieval. In multi-aspect search, the information need of the user consists of more than one aspect or query simultaneously. While visualization and interactive search user interface techniques for improving user interpretation of search results have been proposed, the current research lacks understanding on how useful these are for the user: whether they lead to quantifiable benefits in perceiving the result space and allow faster, and more precise retrieval. Our technique visualizes relevance and document density on a two-dimensional map with respect to the query phrases. Pointing to a location on the map specifies a weight distribution of the relevance to each of the query phrases, according to which search results are re-ranked. User experiments compared our technique to a uni-dimensional search interface with typed query and ranked result list, in perception and retrieval tasks. Visual reranking yielded improved accuracy in perception, higher precision in retrieval and overall faster task execution. Our findings demonstrate the utility of visual re-ranking, and can help designing search user interfaces that support multi-aspect search. Keywords Information visualization; information retrieval; multi-aspect search; multi-dimensional ranking 1. INTRODUCTION Multi-aspect search refers to activities in which the information need of the user consists of more than one aspect or query simultaneously. Such situation arises in contexts such as exploratory search, item selection and multi-criteria decision making. In exploratory search activities, the user’s goal is not clearly defined, and the information space is usually unfamiliar to the user. In such scenarios, the user might start from a small set of notions, with the intent of learning and making sense of the related document space. In this case, conventional result lists offer little insight of the data Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected] CHIIR ’17, March 07-11, 2017, Oslo, Norway © 2017 ACM. ISBN 978-1-4503-4677-1/17/03. . . $15.00 DOI: http://dx.doi.org/10.1145/3020165.3020174 Figure 1: Interactive relevance map visualization. (a) Position of a document marker is computed as a weighted linear combination of relevance to individual query phrases r1, r2, r3. (b) Radius of a document marker encodes the overall relevance of the corresponding document to all query phrases. (c) Opacity encodes the density of document mass in a certain position of the 2D plane. (d) The result list can be re-ranked by relevance and the distance to the selected position rr. and nothing indicates how the given results relate to the multiple aspects of the query. For example, a user looking for recent literature on physiological measurements might want to search for aspects such as ‘Electroencephalography’, ‘Electrodermal Activity’, ‘Electromyography’ and quickly be able to assess how the result space is distributed and how the retrieved documents relate to each aspect. Item or product selection is currently widely supported by faceted search and search result clustering. Such systems are widespread in e-commerce and library catalogs. These techniques allow the user to investigate the results through the use of multiple filters, but they offer limited support for perceiving the result space and weighting the aspects accordingly. Conventional query-based search tools usually visualize results as a one-dimensional ranked list, and offer limited support for multi-aspect retrieval. Another example is multi-criteria decision making, a well researched process that often requires multi-aspect search . Take the example of a user looking online for a new car. Usual faceted tools allow her to select filters to narrow down the offering: e.g., a manufacturer, a price range, a fuel type. Such criteria require the user to have a specific goal in mind, whereas a typical user would be inclined to come up with more vague criteria such as: good gas-mileage (no specific threshold in mind), family-friendly, and/or fun to drive. Such criteria are not binary, and the user can expect to find on the market several satisfying solution with different tradeoffs, instead of one ideal car. On the other hand, looking for such criteria using one single unified query on a conventional search engine returns a list of results that does not reflect the user’s preferences and does not allow for conscious tradeoffs. In all these cases, the user should be able to quickly assess distribution of the results with respect to how they relate to each researched aspect, and then be able to rapidly inspect them, which is possible if the user 1) perceives the distribution to understand which parts of the result space contain interesting information (i.e. what are the tradeoffs between the query phrases) and 2) is able to determine the tradeoff rapidly using the visualization. We present a visual re-ranking technique that uses multidimensional ranking and two-dimensional interactive visualization. Inspired by earlier work on visual information retrieval and seeking [33, 2], the technique allows the user to perceive the relevance distribution with respect to multiple query phrases by using a relevance map visualization. A novel feature of this technique is that it allows the user to investigate specific areas on the map by reranking the results through pointing at the map. The method estimates document relevance with respect to user-specified query phrases in a multi-dimensional space in which the query phrases define the dimensionality. The method then computes a layout for the documents on a two-dimensional plane where relative distances of document markers to each query phrases are defined by their respective relevance, overall relevance of each document is visualized as the radius, and higher document density translates in darker areas. (see Figure 1). The visualization allows the user to perceive how the result space is populated with respect to both density and relevance to some query phrases. Rather than relying only on a one-dimensional ranking algorithm to select the documents most relevant to a query, the role of the system is to organize and present information about many documents and multi-dimensional query phrases in a way that makes comparison possible. Re-ranking by pointing allows users to rank documents with respect to relative relevance weights to the query phrases. For example, expressing that a user wants the ranking to be based a little on both query phrases interaction and interfaces, but mainly on the phrase design can be done simply by pointing to an area on the map that is inside a triangle of the query phrases but closer to the concept design. The approach was evaluated in a controlled laboratory study with 20 participants performing two tasks: perception and retrieval. In the perception task, participants were asked to find out how a document space was populated and organized with respect to specific topics, such as whether there was more research about interaction or design. In the retrieval task, the participants were asked to find documents with varying relevance to several topics, such as a document that was mainly related to design, but slightly related to interaction and interfaces. Our results show significant improvement in task completion time as well as improved accuracy in perception, and improvement in task completion time in retrieval, without compromising effectiveness measured as the quality of the task outcome. These results suggest that relevance mapping and re-ranking is effective in cases when the initial one-dimensional result list is not enough for the user to analyze the information. The contributions of this paper are: (1) We present a visual reranking approach to multi-aspect information retrieval in which users can perceive the result space and rapidly re-rank the result list by pointing to the visualization. (2) We demonstrate that users can complete perception and re-ranking tasks significantly faster without compromising the effectiveness. (3) While different approaches for search result visualization have been proposed in the past, up to our knowledge, this is the first study that empirically verifies the benefits of interactive visualization for multi-aspect information retrieval. 2. 2.1 RELATED WORK Visual Information Retrieval and Seeking Information spaces can be huge and thus hard to comprehend. However, visualizing the space and allowing the user to directly interact with and manipulate objects in the space facilitates comprehension. For instance, when the results of actions are shown immediately and when typing is replaced with pointing or selecting, exploration and retention increase while errors decrease . For information seeking, the following visualization and interaction features are of particular importance : (a) dynamic querying for rapid browsing and filtering to view how results change; (b) a starfield display for the immediate, continuous, scalable display of result sets as different queries are processed; (c) tight coupling of queries to easily use the output of one query as input to another . For instance, a user study indicates that dynamic querying significantly improves user response time and enthusiasm. Using such techniques, systems like FilmFinder  support querying over multiple varying attributes such as time, while showing the changing query results in the context of the overall data. User studies also indicate that user interfaces that show the result list together with an overview of the result categories encourage a deeper and more extensive exploration of the information space , especially when the system allows relevance feedback to be given on such categories to direct the exploration [40, 39]. 2.2 Document Collection Visualization Various visualizations have been proposed for large document collections . Most of these techniques adopt the visual information seeking mantra  to provide an overview at first and details only on demand. The documents are often visualized on a 2D plane, in the form of a map based on a similarity metric. Higherlevel entities, such as topics, are also displayed on the map for immediate and better understanding of the document space organization. Document Atlas  uses Latent Semantic Indexing and multidimensional scaling (MDS) to extract semantic concepts from the text and position the documents with respect to the concepts. Document densities around concepts are visualized as a heat map. On mouse hover, common keywords in the area are listed, and on zoom in, more details are shown. Self-Organizing Maps have also been used by systems like WEBSOM  and Lin’s maps  to position the documents on the 2D plane. WEBSOM also suggests areas in the map that could be relevant to the user’s search query. Lin’s maps are further split up into regions whose area indicates the number of documents with specific related terms. Other techniques visualize the documents as glyphs to indicate additional inter-document relationships and metadata on the map (e.g., [38, 29]). Various metaphors have also been adopted; examples include the terrain metaphor, in which dense regions in the map are seen as mountains with valleys in between [9, 49]; the galaxy metaphor, in which documents are seen as stars in different constellations (document clusters) ; and the physical metaphor, in which documents are considered to be moving particles and the inter-particle forces move similar documents closer to each other and dissimilar documents apart . Visualizations with two dimensions and meaningful axes (e.g., categories vs. hierarchies , query results vs. query index [6, 23], production vs. popularity ) have also been proposed. These visualizations provide an overview of the entire document collection, but they do not allow the user to direct and focus the exploration as required. A user-driven rather than a data-driven technique could be more helpful when searching for documents relevant to multiple keywords. To that end, such a technique should visualize the ranking of documents with respect to multiple keywords so the user can easily judge the relevance of documents to each of the keywords of interest . However, most of the current techniques only visualize whether a document is relevant or not to a keyword using set visualizations , without showing the document’s degree of relevance to each keyword. 2.3 Multi-Aspect Search In multi-aspect search the information need of the user consists of more than one aspect or query simultaneously. As a consequence, an item in a collection needs to be ranked differently based on its multiple attributes. The Graphics, Ranking, and Interaction for Discovery (GRID) principles and the corresponding rankby-feature framework state that interactive exploration of multidimensional data can be facilitated by first analyzing one- and twodimensional distributions and then by exploring relationships between the dimensions, using multi-dimensional rankings to set hypotheses and statistics to confirm them . However, comparing, analyzing and relating different ranks is difficult and requires an interactive visualization that supports the various requirements identified by Gratz et al. . Multi aspect search support is provided in Song et al. , with the proposal of a strategy for multi-aspect oriented query summarization task. The approach is based on a composite query strategy, where a set of component queries are used as data sources for the original query. Similarly Kang et al.  propose a multi-aspect relevance formulation, but in the context of vertical search. LineUp  is an interactive visualization that uses bar charts to support the ranking of objects with respect to multiple heterogeneous attributes. Stepping Stones  visualizes search results for a pair of queries, using a graph to show relationships between the two sets of results. Sparkler  allows to visually compare results sets for different queries on the same topic.Tilebars  visualizes the frequency of different words in various sections of documents as a heat map and ranks the documents accordingly. Similarly, HotMap uses a two-dimensional grid layout to augment a conventional list of search results with colors indicating how hot (relevant) specific search terms are with respect to the document . Ranking cube  is a novel rank-aware cube structure that is capable of simultaneously handling ranked queries and multidimensional selections. RankExplorer  uses stack graphs for time-series data. Techniques for incomplete and partial data have also been proposed . TreeJuxtaposer  was primarily devised to compare rankings. For document collections, the vector space model could be used, such that each document and search query is a vector in a multidimensional space, each axis is a term, and the document position is determined by the frequencies of each term in that document (e.g., ). Visualizations of such a model could aid understanding of the document space, but more research is required, particularly for user-driven approaches that allow the user to specify the dimensions of interest . 2.4 User-driven Visualization VIBE  is one of the most well-known user-driven multidimensional ranking visualization for large document collections. To indicate the subspace of interest, the user first enters two or more query terms, known as "points of interest" (POIs). POIs are then shown (as circles) on a 2D plane, together with documents (as rectangles) related to at least one POI, forming a map. The position of each rectangle indicates the relevance of the corresponding document to each of the POIs. The size of a rectangle indicates the relevance of that document to the search query. Citation details of documents selected from the map are listed; clicking on an item in the list opens the full document. Any time a POI is added, removed or moved, the map is updated accordingly. However, regions of the map with numerous close-by documents are not easily detectable because the rectangles are not color filled; using semi-transparent color filled shapes reduces overplotting  and facilitates perceptual ordering of different regions in the map by their density . Also, documents are not re-ranked as the user navigates over the map. Variants of VIBE include: WebVIBE , in which POIs act like magnets that attract documents containing related terms; VR-VIBE , which visualizes the space in 3D (for more space to view documents between POIs) and depicts relevance by color; and Adaptive VIBE , in which POIs are query terms (as in VIBE) but also user profile terms that are automatically extracted from user notes. Similar to VIBE, GUIDO , DARE  and TOFIR  also allow users to specify POIs and display documents based on their relevance to the POIs. However, in GUIDO each POI is an axis (not an icon on a 2D plane) and documents are positioned based on their absolute rather than relative distances from the POIs. In DARE and TOFIR, relevance to POIs is indicated by both distance and angle. Other user-driven systems, like combinFormation , TopicShop  and InfoCrystal , retrieve and display search results related to user-defined keywords but do not visualize the results’ multi-dimensional ranks. Similarly, HotMap  supports a weighted re-ranking of the search results, but without leveraging a graphical interactive approach for specifying the weights. WordBars  also supports re-ranking of the search results, but uses additional terms extracted from the search results rather than relying on the query terms. While similar techniques of mapping data to 2D visualization for better user interpretation have been proposed, the current research lacks understanding on (1) how useful these are for the user and (2) whether they lead to quantifiable benefits in specific tasks related to search activity. This work is the first to demonstrate a technique where the visualization can be effectively used for re-ranking search results. It is also the first that empirically verifies that users perceive the document space faster and are able to execute retrieval faster without compromising the quality of retrieved information. 3. RELEVANCE MAPPING The method for relevance mapping is first illustrated with an overview from the user perspective. Then the computation of the layout and document visualization is explained. 3.1 Overview Figure 2 shows an example of the relevance map visualization. Here, a user investigates a document space delimited by three query phrases with corresponding markers on the map: design, interaction and interface. A fourth query marker, exploration, is greyed out because it has been disabled to permit a temporary focus on the three remaining query markers. The user has positioned the Figure 2: The relevance map (1) displays documents in relation to multiple query phrases, displayed as red text labels. Here, a fourth query phrase is greyed out (disabled). The exploration cursor (in blue) is located at the user-specified position to be used for the re-ranking. Red document markers indicate the position of articles currently on display in the result list (2). pointer (blue flag with a smiley face) close to interaction to investigate a collection of documents highly related to interaction and more loosely related to interface and design. As a result, the list shows articles ranked with a specific focus on the selected area. Query markers are created by inputting keywords in the query box in the top left. Each query marker can be activated or disabled by clicking it. Documents returned by the system are visualized on the map as semi-opaque dots scattered between the query markers with respect to their individual relevance. The overall relevance of a document is indicated by the radius of the dot. The partial opacity translates overlapping into a darkened tint that cues the user on the number of document markers in any given area. Query markers can be moved/dragged around on the map, which updates the position of the document markers. The position of the pointer can be positioned by dragging or tapping on the map. Any change in the pointer position or query marker organization triggers a re-ranking of documents based on their overall relevance and proximity to the pointer. The ranked articles appear in a conventional one-dimensional list layout in the result list (2). Documents being displayed in the result list are shown as red dots on the map. The result list is scrollable. Each document is displayed with its title, authors, publication venue, abstract and keywords. Abstracts are first shown partially but can be displayed in full at a click or a tap. Keywords are interactive, as they can be added to the map as new query markers on a tap. the relevance scores to each query phrase and the relative position of the query marker. Intuitively, document markers are positioned proportional to their relevance to each of the query phrases. Formally, the position of an jth document marker on dimension dim is: |Q| P rqi d j · posqdim i (1) posd jdim = |Q| so that posdidim is the coordinate of document di with respect to dimension dim. On a two-dimensional plane dim can be x or y. The relevance estimation rqi d j of a document to a query phrase is explained in the next section. 3.2 4. Layout The data used to compute the relevance map layout consists of a set of m query phrases q1...m ∈ Q, a set of k documents d1...k ∈ D and relevance estimates r1...k ∈ R for each of the k documents according to each of the m query phrases. Each query marker and each document marker has a position on the plane, posqx , posqy and posdx , posdy respectively. The position of each query phrase marker is defined by the user by moving it to the desired position on the plane. The position of each of the document markers is computed as a weighted linear combination of 3.3 Document Marker Visualization The radius of the document marker is directly the relevance rqi d j . That is, the size of the dot is defined by the relevance. The opacity of overlapping document markers is used to visualize the density of the document mass in a particular position on the plane. We use a standard computation of opacity  in which opacity of o of a pixel on the plane is computed as: o = 1 − (1 − f )n (2) where n is the number of overlapping layers and f is a constant setting of an opacity effect of an individual layer and was set to f = 0.95. RELEVANCE ESTIMATION The relevance estimation used in ranking and computing the document marker layout and size are explained in this section. 4.1 Relevance Estimation Given the document collection and a set of query phrases that specify the multiple dimensions to be used in ranking and visualization, the relevance estimation method results in a set of probabilities r1...k ∈ R for each document d of k documents in the collection according to each query phrase q1...m ∈ Q. To estimate the probabilities from the query phrases Q and documents D, we utilize the language modeling approach of information retrieval . We use a multinomial unigram language model. The vector Q of query phrases is treated as a sample of a desired document, and document d j is ranked according to a query phrase qi by the probability that qi would be generated by the respective language model Md j for the document; with the maximum likelihood estimation we get P(q|Md j ) = m Y P̂mle (qi |Md j )wi , (3) i=1 where wi is the weight of each of the query phrases and is set as 1 as default. In case of interactive re-ranking wi is weighted wi = |Q| based on user interactions as explained in the next section. To estimate the relevance rqi d j of an individual document d j with respect to an individual dimension defined by each query phrase qi and avoid zero probabilities, we then compute a smoothed relevance estimate by using Bayesian Dirichlet smoothing for the language model so that c(qi |d j ) + µp(qi |C) P , c(q|d j ) + µ rqi d j = Pmle (qi |Md j ) = (4) k where c(di |d j ) is the count of a query phrase qi in document d j , p(qi |C) is the occurrence probability (proportion) of a query phrase qi in the whole document collection, and the parameter µ is set to 2000 as suggested in the literature . 4.2 Ranking Given the probability estimates for each of the documents, we apply a probability ranking principle  to rank the documents in descending order of their probabilities for the query phrases. These are then used to compute the total ordering of the document list. The top-k ranking computation remains efficient by making use of priority queue with complexity log(k) of k search results with presorted inverted index. The user can interactively re-rank the result list by selecting a point on the relevance map. The point for the desired re-ranking is defined by its two-dimensional coordinates rr x and rry with respect to the two-dimensional coordinates of the query markers posqix and posqiy for the i = 1 . . . |Q| query phrases. The re-rank weighting for an ith query marker is computed as the Euclidean distance between the posqix and posqiy and the rr x and rry . Formally, wi = q (posqix − rr x )2 + (posqix − rry )2 |Q| P , (5) qi i The re-ranking of the documents is then computed using these distances by Formula 3 by setting the weight wi accordingly. Intuitively, the distance from the query marker is used as the importance of the query phrase in the ranking of the documents. 5. EXPERIMENTS The current research lacks understanding on the end-user benefits of interactive visualization in multi-aspect search scenarios. The perceived simplicity and overall familiarity of well-studied conventional search system interfaces – like the current de-facto search interface with typed query and a ranked result list – have not been challenged in experiments that measure the quantifiable benefits of task completion time and effectiveness. We conducted a controlled laboratory experiment in which the relevance mapping and re-ranking were compared to a conventional ranked list visualization in two basic tasks that searchers have to perform when using an information retrieval system: perception and retrieval. The perception task sought understanding on the benefits of the visualization in perceiving the distribution and density of resulting documents with respect to the multi-aspect query phrases. The retrieval task sought understanding on the benefits of the visualization in re-ranking the results according to a user specified distribution over the importance of the different query phrases (see Figures 3b,3c1 , and 3c2 ). The benefits were measured with respect to task completion time and effectiveness (quality of the perception or retrieval). The following subsections explain the details of the experiments. 5.1 Hypotheses The study tested the following four hypotheses: • H1: Efficient perception hypothesis: The relevance map allows faster perception of the result set. • H2: Efficient retrieval hypothesis: The relevance map allows faster retrieval of relevant information. • H3: Effective perception hypothesis: The relevance map allows more accurate perception of the result set. • H4: Effective retrieval hypothesis: The relevance map allows retrieval of more highly relevant information. 5.2 Experimental Design The experiment used a 2 × 2 within-subjects design with two search tasks and two systems. The conditions were counterbalanced by varying the order of the systems and tasks. 5.3 Baseline A baseline system, shown in Figure 3a, was implemented to enable comparability and as to ensure that the evaluation revealed the effects solely on the features enabling relevance mapping and reranking. The baseline used the same data collection as well as the same document ranking model. All retrieved information in the baseline system was displayed with a ranked list layout. The baseline did not feature a relevance map, and the ranking was based on a single query at a time. The baseline was using the same hardware, i.e. a multi-touch-enabled desktop computer with a physical keyboard. 5.4 Tasks The experiment consisted of two tasks, perception and retrieval, which are explained below and exemplified in Figures 3b,3c1 , and 3c2 . Both tasks used a common set of four topics, either (1) interaction, tabletop, tangible, and prototyping, or (2) surfaces, exploration, visualization, and sound. The two set of topics were formed by two researchers who were experts on human-computer interaction. The same researchers were then asked to assess the task outcomes of the participants. 5.4.1 Perception Task The perception task aimed to measure task completion time and effectiveness, to help understand how a document space is populated and organized with respect to specific query topics. Participants were asked the two following questions: (1) "Out of the 4 topics provided, which 2 topics are related to the highest amount of Figure 3: In the perception task, participants must identify the two and three keywords out of four that are the most related to relevant information. In the baseline (a), they must skim through the ranked list of results to infer the most prevalent keywords from the top articles. Using the relevance map, they must interpret the distribution of document markers. In the retrieval task, participants must find an article that shows a high relevance to one keyword (say, tabletop), and a lesser relevance with two other keywords (say, tangible and interaction). Using the baseline (a), they must query the three keywords, then find a fitting article in the result list. Using the relevance map, they point (by tapping on the touch-enabled monitor) at an area between the three keywords (c1), somewhere closer to tabletop than tangible or interaction, which triggers a re-ranking of retrieved articles based on the selected position (c2). The participant should be able to select one of the top articles as a fitting task outcome. relevant documents?", and (2) "Out of the 4 topics provided, which 3 topics are related to the highest number of relevant documents?". An example visualization from which the user had to select the topics is shown in Figure 3b. In that case, we can see that the space delimited by tabletop, tangible, and interaction is the most densely populated through sheer amount of document markers, making them part of the answer. To find the two keywords, they must then compare pair-wise document density by focusing on the edges between query phrase markers, with a slight but noticeable lead in density (encoded as darkness) between tangible and interaction. 5.4.2 Retrieval Task The retrieval task aimed to measure task completion time and effectiveness in finding documents with varying multi-dimensional relevance toward several topics. Participants were given the following instruction: "Find one article that is highly relevant to ‘Topic A’ and slightly related to ‘Topic B’ and ‘Topic C’.". The task was then repeated one more time with a different topic priority: "Find one paper that is highly relevant to ‘Topic B’ and slightly related to ‘Topic A’ and ‘Topic C’." An example sequence of a visualization, user pointing to the visualization to re-rank the document list from which the user had to select the documents is shown in Figures 3c1 and 3c2 . 5.5 Measures We used two performance measures: task completion time and effectiveness. Task Completion Time measured the time required to complete the task. Effectiveness measured the quality of the task outcome. 5.5.1 Task Completion Time Task completion time was computed directly as the duration in seconds from the beginning of the task to the completion of the task. 5.5.2 Effectiveness Effectiveness was computed differently for the two tasks and the corresponding ground truths for the task outcomes were defined differently. In the perception task, effectiveness was measured as the accuracy of the participants answer. The ground truth was available from the relevance estimation and was computed as a sum of the relevance scores associated to each query phrase representing the topic. The topics were then ordered based on the sum of relevance scores and the top 2 and top 3 topics corresponding to the task description were selected as the ground truth to which each answer was then compared. Accuracy was computed for each answer, resulting in a grade of 1 for a match, 0 for a mismatch, and – in the case two topics selected out of four – 0.5 for a partial match. Each participant having returned two answers, effectiveness was then measured as the mean of both grades. In the retrieval task, effectiveness was measured as precision on the documents selected by the participants. All documents chosen by any of the participants in any of the two system conditions were pooled. Two experts then assessed the actual relevance of each document to each topic. The experts being authors of the experiment design and having themselves devised the topics, potential bias in the assessment was addressed by following a strict doubleblind procedure (i.e. experts had no knowledge of the participant, the system or concurrent assessment) and balancing the use of each set of topics across both conditions. The experts assigned for each document a grade between 0 (non-relevant) and 5 (highly relevant) to each of the topics, which were then averaged (mean) into a final grade. The topic defined as highly relevant was given a double coefficient so that the final grade reflected the weighted aspect of the task. The final grade indicated the expert opinion on how relevant the document was for the task. The inter-annotator agreement between the experts was measured by using Cohen’s Kappa for two raters who provided three relevance assessments per document. Agreement was found to be substantial (Kappa = 0.684, Z = 7.04, p < 0.001), indicating that the expert assessments were consistent. Additionally, we collected the position in the result list of each document returned by each participant, to better understand the reranking/scrolling tradeoff. 5.6 Data logging and data collection For the purpose of the task completion time measurement, we recorded (1) the task duration from the start button press to the end button press. For the purpose of the effectiveness measurement, we recorded (2) bookmarked documents. After completion of both tasks in both conditions, participants were given a questionnaire to collect data on their age, gender, academic background and research experience. We used a document set including all articles available at the Digital Library of the Association of Computing Machinery (ACM) as of the end of 2011. The information about each document consists of its title, abstract, author names, publication year, and publication venue. Articles with missing information in the metadata were excluded during the indexing phase, resulting in a database with over 320,000 documents. Both the baseline and the proposed system used the same document set and the users were presented with the top 2000 documents. 5.7 Participants Twenty researchers in computer science (40% females) from two universities, ranging in age from 21 to 36 years old and from 1 to 8 years in research experience, volunteered to participate in the experiment. The participants were all compensated with a movie voucher that they received at the end of the experiment. All participants were assigned the same experimental tasks on both systems with systematic varying order between the systems. In this experiment, informed consent was obtained from all participants. 5.8 Apparatus Participants performed the experiment on a desktop computer with a 27" multi-touch-enabled capacitive monitor (Dell XPS27). The computer was running Microsoft Windows 8 and both systems – being Web based – were used on a Chrome Web browser version 45.0.2454.85 m. A physical keyboard was provided for text input, whereas pointing, dragging and scrolling were performed through touch interaction. The search engine implementing the relevance estimation method was running on a virtual server and the document index was implemented as an in-memory inverted index allowing very fast response times with an average latency of less than one second. 5.9 Procedure The tasks were described on individual instruction sheets that incorporated one of the two sets of keywords, to which we will refer as the task versions. The duration of the tasks was not constrained. To avoid introduction of confounding variables, we counterbalanced the tasks by systematically changing the order of the systems, the order of the task versions, and which task version was allocated to each system. Considering the novelty aspect of the visualization, a training version of the tasks was devised, allowing participants to use both system with comparable proficiency. Training tasks had to be done using each system, right before the main task, using a separate set of four keywords: creativity, collaboration, children and robotics. The training started with the participant receiving a tutorial on how to use the system, then, while performing the training task, she could ask questions about either the task or the system. As soon as the training task was completed and the participant had no more questions, the participants started the actual experiment. Participants were asked to underline the chosen answers on the instruction sheet. In the retrieval task, we asked the participants to bookmark the chosen articles. A Start/Submit button was added to both systems in the upper right corner. To be able to use each system, participants had to tap Start when ready to perform each task and Submit when they had completed it. 6. RESULTS The results of the experiment regarding performance are shown in Table 1 and illustrated in Figure 4 with respect to the selected measures: task completion time and effectiveness, and reported according to both tasks, perception and retrieval. The mean position of selected articles in the result list is also illustrated in Figure 4. The results are discussed in detail in the following sections. 6.1 Task Completion Time Significant differences were found between the systems in both tasks, which are discussed as follows. 6.1.1 Perception Task The results of the perception task show that participants spent substantially less time completing the perception task when using the relevance map than when using the baseline system. The mean task duration for the relevance map was 84.23 seconds, while the mean task duration for the baseline system was 177.72 seconds. The differences between the systems were found statistically significant (Wilcoxon pair-matching ranked-sign test: Z = 3.27; p < 0.001). In conclusion, the relevance map shows 111% improvement, and was therefore more efficient for the perception task, confirming H1. Perception Retrieval Task Completion Time Baseline (B) Map (M) M SD M SD 177.72 116.20 84.23 39.38 137.53 101.66 80.93 70.65 B vs. M Wilcoxon Test p < 0.001 p < 0.001 Baseline (B) M SD 0.75 0.23 0.70 0.13 Effectiveness Map (M) M SD 0.89 0.17 0.71 0.12 B vs. M Wilcoxon Test p = 0.013 p = 0.95 Table 1: Task completion time and effectiveness results for both tasks. Task completion time is reported as a duration of the task averaged over participants. Effectiveness in the perception task is reported by mean quality of topics averaged over participants, and effectiveness in the retrieval task by mean quality of documents averaged over participants. Results showing significant improvement over the baseline are shown in bold. Figure 4: Results from the performance measures displayed for both systems with confidence intervals for: (a) task completion time in the perception task and (b) task completion time in the retrieval task with the mean duration (lower is better), (c) effectiveness in the perception task with the mean topic quality, and (d) effectiveness in the retrieval task with the mean document quality (higher is better). (e) Mean position in the result list of selected articles in the retrieval task. 6.1.2 Retrieval Task In the retrieval task, participants spent substantially less time completing the task when using the relevance map than when using the baseline system. The mean task duration for the relevance map was 80.93 seconds, while the mean task duration for the baseline system was 137.53 seconds. The differences between the systems were found to be statistically significant (Wilcoxon pair-matching ranked-sign test: Z = 3.87; p < 0.001). In conclusion, relevance map shows 70% improvement and was therefore more efficient for the retrieval task, confirming H2. Using the relevance map, participants selected articles close to the top in the result list, with a mean position of 1.48 (SD = 1.20), while the mean position of the selected article for the baseline system was 6.33 (SD = 7.20). The differences between the systems were found statistically significant (Wilcoxon pair-matching ranked-sign test: Z = 4.77; p < 0.001). 6.2 6.2.1 Effectiveness Perception Task In the perception task, the effectiveness as measured by the accuracy of the topics selected by the participants on the relevance map is 0.89, while accuracy on the baseline system is 0.75. The differences between the systems were found to be statistically significant (Wilcoxon pair-matching ranked-sign test: Z = −2.46; p = 0.013). In conclusion, relevance map was more effective for the perception task, confirming H3. 6.2.2 Retrieval Task No statistically significant difference in the relevance of retrieved documents was found in the retrieval task (Wilcoxon pair-matching ranked-sign test: Z = −0.07 and p = 0.95). The fourth chart in figure 4 shows very similar results for both systems. This result fails to confirm H4, but it shows that the improvement in task completion time observed in the retrieval task did not impair the quality of the retrieved documents. 7. DISCUSSION The results of the experiments show significant improvements in task completion time in both perception and retrieval, without compromising effectiveness. These results confirm hypotheses H1, H2 and H3. In the perception task, participants were able to use the relevance map visualization to make decisions with greater accuracy, 111% faster. The visualization allowed the participants to understand more accurately the distribution of information with respects to the multiple aspects of the query. In the retrieval task, documents fitting complex criteria were retrieved 70% faster using re-ranking through interaction with the relevance map. While finding documents with different relevance to several topics requires users to go through long lists of results and assess the relevance of individual documents, our proposed method for re-ranking through pointing at the map successfully narrows down the top results to documents that fit the criteria. The quality of the task outcome was the same in both conditions in the retrieval task, which failed to confirm hypothesis H4. A possible reason for equal performance is the absence of strict time constraints for participants to complete the tasks. It is possible that a constrained time to complete the task would have negatively impacted the quality of the task outcome for the baseline, as the participants would not have been able to carefully examine the list to find a fitting article, but would have been forced to skim, resulting in possibly lower quality of selected topics and articles. While our results show substantial improvements over the baseline, there is a tradeoff between the perceived simplicity of a result list and the added visual complexity of a relevance map. Interaction-wise, a result list is explored by scrolling, while a relevance map requires more complex behavior, justifying the use of a training session and tutorial. In the context of the present experiment, the necessity for a tutorial introduces a risk of influencing participants towards optimal behaviors that may outperform selfdevised strategies. While our results suggest that the design of such visual interfaces can make both retrieval and perception faster, simpler interfaces may be more effective when the cost of interactions is higher, e.g. smaller devices and mobile scenarios. We see further research directions to be addressed. First, different task complexity could be investigated and open-ended tasks explored, in which users would have more control over the search process. Second, more realistic search situations outside of our present laboratory experiment could be exploited to investigate interaction with relevance mapping and re-ranking functionality in situations in which users would have the possibility to try their own areas of interest and determine whether the suggestion effectively met their preferences and expectations. 8. CONCLUSION Conventional systems for information retrieval are not designed to provide important insights of the data, such as relevance distribution of the results with respect to the user’s query phrases. In this paper, we introduced visual re-ranking, an interactive visualization technique for multi-aspect information retrieval that helps overcome such limitations. The method proved successful in substantially improving performance over complex analytical tasks. Evaluation showed that users are able to make sense of the relevance map and take advantage of the re-ranking interaction to lower the time required to make analytic decisions or retrieve documents based on complex criteria. These results suggest that the conventional one-dimensional ranked list of results may not be enough for complex search-related tasks that go beyond simple fact finding. 9. ACKNOWLEDGMENTS This research was partially funded by the European Commission through the FP7 Project MindSee 611570 and the Academy of Finland (Multivire, 255725, 278090 and 305739). The data used in the experiments is derived from the ACM Digital Library. 10. REFERENCES  C. Ahlberg and B. Shneiderman. Visual information seeking: Tight coupling of dynamic query filters with starfield displays. In Proc. CHI’94, pages 313–317. ACM, 1994.  C. Ahlberg, C. Williamson, and B. Shneiderman. Dynamic queries for information exploration: An implementation and evaluation. In Proc. CHI’92, pages 619–626. ACM, 1992.  J.-W. Ahn and P. Brusilovsky. Adaptive visualization of search results: Bringing user models to visual analytics. Information Visualization, 8(3):167–179, 2009.  B. Alsallakh, L. Micallef, W. Aigner, H. Hauser, S. Miksch, and P. Rodgers. Visualizing sets and set-typed data: State-of-the-art and future challenges. In EuroVis– State of The Art Reports, pages 1–21. Eurographics, 2014.  B. Amento, W. Hill, L. Terveen, D. Hix, and P. Ju. An empirical evaluation of user interfaces for topic management of web sites. In Proc. CHI’99, pages 552–559. ACM, 1999.  S. Andolina, K. Klouche, J. Peltonen, M. Hoque, T. Ruotsalo, D. Cabral, A. Klami, D. Głowacka, P. Floréen, and G. Jacucci. Intentstreams: smart parallel search streams for branching exploratory search. In Proc. IUI’15, pages 300–305. ACM, 2015.  S. Benford, D. Snowdon, C. Greenhalgh, R. Ingram, I. Knox, and C. Brown. Vr-vibe: A virtual environment for co-operative information retrieval. In Computer Graphics Forum, volume 14, pages 349–360. Wiley, 1995.  P. P. Bonissone, R. Subbu, and J. Lizzi. Multicriteria decision making (MCDM): a framework for research and applications. Computational Intelligence Magazine, 4(3):48–61, 2009.  K. W. Boyack, B. N. Wylie, and G. S. Davidson. Domain visualization using VxInsight® for science and technology management. JASIST, 53(9):764–774, 2002.  M. Chalmers and P. Chitson. Bead: Explorations in information visualization. In Proc. SIGIR’92, pages 330–337. ACM, 1992.  F. Das-Neves, E. A. Fox, and X. Yu. Connecting topics in document collections with stepping stones and pathways. In Proc. CIKM’05, pages 91–98. ACM, 2005.  B. Fortuna, M. Grobelnik, and D. Mladenic. Visualization of text document corpus. Informatica, 29(4):497–502, 2005.  S. Gratzl, A. Lex, N. Gehlenborg, H. Pfister, and M. Streit. Lineup: Visual analysis of multi-attribute rankings. TVCG, 19(12):2277–2286, Dec 2013.  S. Havre, E. Hetzler, K. Perrine, E. Jurrus, and N. Miller. Interactive visualization of multiple query results. In Information Visualization, page 105. IEEE, 2001.  M. A. Hearst. Tilebars: Visualization of term distribution information in full text information access. In Proc. CHI’95, pages 59–66. ACM, 1995.  E. Hetzler and A. Turner. Analysis experiences using information visualization. IEEE CG&A, 24(5):22–26, 2004.  O. Hoeber and X. D. Yang. Interactive web information retrieval using wordbars. In International Conference on Web Intelligence, pages 875–882. IEEE, 2006.  O. Hoeber and X. D. Yang. The visual exploration of web search results using hotmap. In Proc. IV’06, pages 157–165. IEEE Computer Society, 2006.  C. Kang, X. Wang, Y. Chang, and B. Tseng. Learning to rank with multi-aspect relevance for vertical search. In Proc. WSDM’12, pages 453–462. ACM, 2012.  S. Kaski, T. Honkela, K. Lagus, and T. Kohonen. WEBSOM–self-organizing maps of document collections. Neurocomputing, 21(1):101–117, 1998.  A. Kerne, E. Koh, B. Dworaczyk, J. M. Mistrot, H. Choi, S. M. Smith, R. Graeber, D. Caruso, A. Webb, R. Hill, and J. Albea. combinformation: A mixed-initiative system for representing collections as compositions of image and text surrogates. In Proc. JCDL’06, pages 11–20. ACM, 2006.  P. Kidwell, G. Lebanon, and W. S. Cleveland. Visualizing incomplete and partially ranked data. volume 14, pages 1356–1363. IEEE, 2008.  K. Klouche, T. Ruotsalo, D. Cabral, S. Andolina, A. Bellucci, and G. Jacucci. Designing for exploratory search on touch devices. In Proc. CHI’15, pages 4189–4198. ACM, 2015.  K. Kucher and A. Kerren. Text visualization techniques: Taxonomy, visual survey, and community insights. In PacificVis’15, pages 117–121. IEEE Computer Society, 2015.  B. Kules and B. Shneiderman. Users can change their web search tactics: Design guidelines for categorized overviews. IPM, 44(2):463–484, 2008.  X. Lin. Map displays for information retrieval. JASIS, 48(1):40–54, 1997.  J. Mackinlay. Automating the design of graphical presentations of relational information. TOG, 5(2):110–141, 1986.  J. Matejka, F. Anderson, and G. Fitzmaurice. Dynamic opacity optimization for scatter plots. In Proc. CHI’15, pages 2707–2710. ACM, 2015.  N. E. Miller, P. C. Wong, M. Brewster, and H. Foote. Topic islands tm-a wavelet-based text visualization system. In Proc. VIS, pages 189–196. IEEE, 1998.  E. Morse and M. Lewis. Why information retrieval visualizations sometimes fail. In Proc. IEEE SMC’97, volume 2, pages 1680–1685, Oct 1997.  T. Munzner, F. Guimbretière, S. Tasiran, L. Zhang, and Y. Zhou. TreeJuxtaposer: scalable tree comparison using focus + context with guaranteed visibility. TOG, 22(3):453–462, 2003.  A. Nuchprayoon and R. R. Korfhage. Guido, a visual tool for retrieving documents. In Proc. VL/HCC’94, pages 64–71. IEEE, 1994.  K. A. Olsen, R. R. Korfhage, K. M. Sochats, M. B. Spring, and J. G. Williams. Visualization of a document collection: The VIBE system. IPM, 29(1):69–81, 1993.  J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proc. SIGIR’98, pages 275–281. ACM, 1998.  T. Porter and T. Duff. Compositing digital images. In Proc. SIGGRAPH’84, pages 253–259. ACM, 1984.  V. V. Raghavan and S. M. Wong. A critical analysis of vector space model for information retrieval. JASIS, 37(5):279–287, 1986.  S. E. Robertson. Readings in information retrieval. chapter The Probability Ranking Principle in IR, pages 281–286. Morgan Kaufmann Publishers Inc., 1997.  R. M. Rohrer, D. S. Ebert, and J. L. Sibert. The shape of Shakespeare: visualizing text using implicit surfaces. In Proc. InfoVis, pages 121–129. IEEE, 1998.  T. Ruotsalo, G. Jacucci, P. Myllymäki, and S. Kaski. Interactive intent modeling: Information discovery beyond search. Communications of the ACM, 58(1):86–92, 2015.  T. Ruotsalo, J. Peltonen, M. Eugster, D. Głowacka, K. Konyushkova, K. Athukorala, I. Kosunen, A. Reijonen, P. Myllymäki, G. Jacucci, et al. Directing exploratory search with interactive intent modeling. In Proc. CIKM’13, pages 1759–1764. ACM, 2013.  J. Seo and B. Shneiderman. A rank-by-feature framework for interactive exploration of multidimensional data. Information Visualization, 4(2):96–113, 2005.  C. Shi, W. Cui, S. Liu, P. Xu, W. Chen, and H. Qu. RankExplorer: Visualization of ranking changes in large time series data. TVCG, 18(12):2669–2678, 2012.  B. Shneiderman. Dynamic queries for visual information seeking. Software, 11(6):70–77, Nov 1994.  B. Shneiderman. The eyes have it: A task by data type taxonomy for information visualizations. In Proc. VL/HCC’96, pages 336–343. IEEE, 1996.  B. Shneiderman, D. Feldman, A. Rose, and X. F. Grau. Visualizing digital library search results with categorical and hierarchical axes. In Proc. DL’00, pages 57–66. ACM, 2000.  B. Shneiderman, C. Plaisant, M. Cohen, and S. Jacobs. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley Publishing Company, 5th edition, 2009.  W. Song, Q. Yu, Z. Xu, T. Liu, S. Li, and J.-R. Wen. Multi-aspect query summarization by composite query. In Proc. SIGIR’12, pages 325–334. ACM, 2012.  A. Spoerri. InfoCrystal: A visual tool for information retrieval & management. In Proc. CIKM’93, pages 11–20. ACM, 1993.  J. Wise, J. J. Thomas, K. Pennock, D. Lantrip, M. Pottier, A. Schur, V. Crow, et al. Visualizing the non-visual: spatial analysis and interaction with information from text documents. In Proc. Information Visualization, pages 51–58. IEEE, 1995.  D. Xin, J. Han, H. Cheng, and X. Li. Answering top-k queries with multi-dimensional selections: The ranking cube approach. In Proc. VLDB’06, pages 463–474. VLDB, 2006.  C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. TOIS, 22(2):179–214, 2004.  J. Zhang. Tofir: A tool of facilitating information retrieval–introduce a visual retrieval model. IPM, 37(4):639–657, 2001.  J. Zhang and R. R. Korfhage. Dare: Distance and angle retrieval environment: A tale of the two measures. JASIS, 50(9):779–787, 1999.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project