Changing a user’s search experience by incorporating preferences of metadata MIRAN ALI

Changing a user’s search experience by incorporating preferences of metadata MIRAN ALI
DEGREE PROJECT, IN DD221X FOR MASTER DEGREE IN COMPUTER
SCIENCE , SECOND LEVEL
STOCKHOLM, SWEDEN 2014
Changing a user’s search experience
by incorporating preferences of
metadata
MIRAN ALI
KTH ROYAL INSTITUTE OF TECHNOLOGY
COMPUTER SCIENCE AND COMMUNICATION (CSC)
Abstract
Implicit feedback is usually data that comes from users’
clicks, search queries and text highlights. It exists in abundance, but it is riddled with much noise and requires advanced algorithms to properly make good use of it. Several
findings suggest that factors such as click-through data and
reading time could be used to create user behaviour models
in order to predict the users’ information need.
This Master’s thesis aims to use click-through data and
search queries together with heuristics to create a model
that prioritises metadata-fields of the documents in order to
predict the information need of a user. Simply put, implicit
feedback will be used to improve the precision of a search
engine. The Master’s thesis was carried out at Findwise
AB - a search engine consultancy firm.
Documents from the benchmark dataset INEX were indexed into a search engine. Two different heuristics were
proposed that increment the priority of different metadatafields based on the users’ search queries and clicks. It was
assumed that the heuristics would be able to change the
listing order of the search results. Evaluations were carried out for the two heuristics and the unmodified search
engine was used as the baseline for the experiment. The
evaluations were based on simulating a user that searches
queries and clicks on documents. The queries and documents, with manually tagged relevance, used in the evaluation came from a data set given by INEX. It was expected
that listing order would change in a way that was favourable
for the user; the top-ranking results would be documents
that truly were in the interest of the user.
The evaluations revealed that the behaviour of the heuristics and the baseline have erratic behaviours and metrics
never converged to any specific mean-relevance. A statistical test revealed that there is no difference in accuracy between the heuristics and the baseline. These results mean
that the proposed heuristics do not improve the precision of
the search engine and several factors, such as the indexing
of too redundant metadata, could have been responsible for
this outcome.
Referat
Ändra en användares sökupplevelse genom
att inkorporera metadatapreferenser
Implicit feedback är oftast data som kommer från användarnas
klick, sökfrågor och textmarkeringar. Denna data finns i
överflöd, men har för mycket brus och kräver avancerade
algoritmer för att man ska kunna dra nytta av den. Flera rön föreslår att faktorer som klickdata och läsningstid
kan användas för att skapa beteendemodeller för att förutse
användarens informationsbehov.
Detta examensarbete ämnar att använda klickdata och
sökfrågor tillsammans med heuristiker för att skapa en modell som prioriterar metadata-fält i dokument så att användarens informationsbehov kan förutses. Alltså ska implicit feedback användas för att förbättra en sökmotors precision. Examensarbetet utfördes hos Findwise AB - en konsultfirma
som specialiserar sig på söklösningar.
Dokument från utvärderingsdatamängden INEX indexerades i en sökmotor. Två olika heuristiker skapades för att
ändra prioriteten av metadata-fälten utifrån användarnas
sök- och klickdata. Det antogs att heuristikerna skulle kunna förändra ordningen av sökresultaten. Evalueringar utfördes
för båda heuristiker och den omodifierade sökmotorn användes
som måttstock för experimentet. Evalueringarna gick ut på
att simulera en användare som söker på frågor och klickar
på dokument. Dessa frågor och dokument, med manuellt
taggad relevansdata, kom från en datamängd som tillhandahölls av INEX.
Evalueringarna visade att beteendet av heuristikerna
och måttstocket är slumpmässiga och oberäkneliga. Ingen
av heuristikerna konvergerar mot någon specifik medelrelevans. Ett statistiskt test visar att det inte är någon signifikant skillnad på uppmätt träffsäkerhet mellan heuristikerna
och måttstocket. Dessa resultat innebär att heuristikerna
inte förbättrar sökmotorns precision. Detta utfall kan bero
på flera faktorer som t.ex. indexering av överflödig metadata.
Contents
Acknowledgements
1
1 Introduction
1.1 Problem Statement .
1.2 Contributions . . . .
1.3 The Project Provider
1.4 Report Outline . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
3
3
3
4
2 Background
2.1 Ranked Retrieval . . . . . . . . . . . . .
2.1.1 Vector Space Model . . . . . . .
2.2 Solr . . . . . . . . . . . . . . . . . . . .
2.3 Implicit Feedback . . . . . . . . . . . . .
2.3.1 Evaluating Differences . . . . . .
2.3.2 Drawbacks of Implicit Feedback .
2.3.3 Different applications . . . . . .
2.3.4 User Behaviour Models . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
7
7
8
8
8
10
.
.
.
.
.
.
.
.
.
.
11
11
11
12
13
15
18
18
19
20
20
4 Results
4.1 Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Test case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
22
29
30
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Methodology
3.1 Heuristics . . . . . . . . . . . . . . . . . . . . .
3.1.1 Field value boost . . . . . . . . . . . . .
3.1.2 Field value boost with dampening effect
3.2 Gathering Data . . . . . . . . . . . . . . . . . .
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . .
3.4 Technical Configurations . . . . . . . . . . . . .
3.4.1 Search Engine . . . . . . . . . . . . . . .
3.4.2 Jellyfish Service . . . . . . . . . . . . . .
3.4.3 Searching interface . . . . . . . . . . . .
3.4.4 Database . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4.2.2
Test case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
5 Conclusions
5.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Future improvements . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
32
33
References
35
Acknowledgements
This Master’s thesis would not have been a possibility without the help of numerous
academics, experts within the field of information retrieval, and loved ones.
I was preoccupied fighting potential dengue risks and extreme humidity in Singapore when I received joyful news from Simon Stenström about conducting my
Master’s thesis at Findwise AB. They gave me a chance to work with a field that
was close to my heart and do so in a very modern and professional work environment. Evidently, I made the wise choice of accepting the offer.
Working at the Findwise office turned out to be an experiencce that gave me
first-hand contact with developers that strived to be as helpful as possible for all
of the thesis workers. This behaviour was highly prevalent in my supervisor at
Findwise - Meidi Tõnisson. Not only was it possible for me to send my inquiries
to her at almost anytime, Meidi is a former thesis worker at Findwise and she was
able to help me with the organisation of the project as well as give me a skill set to
manage some of the technical components of development. Martin Nycander and
Simon Stenström were responsible for holding an internal lecture at Findwise that
was very helpful, since it covered how Findwise’s own products are used and how
they could be utilised in the project. I would also like to thank the rest of the
employees for giving me several laughs at the lunch table and also for the times
they reached out and gave me help with annoying bugs in software.
I feel extremely privileged to have had Hedvig Kjellström as my academic supervisor. She had a constant presence during the course of the Master’s thesis
through e-mail contact and insisted that I should meet her on a regular basis in
order to receive feedback. Integral parts of this report would not have been possible
to conceive if not for Hedvig’s expertise on the subject.
Finally, I would like to thank my wife Sivan. Truly the light of my life and
my kindred spirit, she has always been like a pillar that supports me during tough
times and she always has time to listen to me rambling on about my Master’s thesis.
Thank you all for your everlasting support and patience!
1
Chapter 1
Introduction
The field of information retrieval has gone from a primarily academic principle to
being the most common way for people to access information on a daily basis [1].
The most common application of IR is the multitude of search engines that can be
found online. While IR started off as means for information professionals to find
relevant data - scientific innovations, superb engineering and a massive price drop in
computer hardware has lead to search engines that are capable of searching through
several millions pages within subsequent response times for hundreds of millions of
searches every day [1].
While the major search engines are doing a fantastic job at searching through
millions of indexed pages, there is always room for improvement. Joachims and
Radlinski [2] touch upon this subject and mention that Implicit Feedback (IF)
is important if we are to advance beyond the one-ranking-fits-all approach. An
application of IF means that a user’s behaviour is used to adapt the search engine to
fit that particular user. One way of going beyond the one-ranking-fits-all approach
is by incorporating user behaviour models into the search engine [3, 4, 5]. The
authors in the aforementioned citations assume that users’ informational needs will
change over time and, thus, create user behaviour models that are specific for certain
sessions; the user wants to know more about the programming language Java today
by typing the query ”Java”, but will ask for information about Java Island tomorrow
with the same query. Search engines are to quickly adapt to this kind of behaviour
and the aforementioned models help in abstracting this.
This report will delve into subjects like ranked retrieval, implicit feedback and
user behaviour models. All of these are deeply intertwined, as the reader of this
report will notice. Different ways to permutate the ranking of documents, given
by a search engine, will be proposed and implemented. Evaluation of heuristics
will be performed and the data will go through statistical methods to deduce if the
proposed heuristics improved the user’s search experience.
A lot has been done in the world of IR and the main part of this report focuses
on customizing the ranking of a state-of-the-art search engine through different
heuristics. The suggested way of performing this customization will stand as the
2
CHAPTER 1. INTRODUCTION
academic contribution of this report.
1.1
Problem Statement
The purposes of this Master’s thesis are to investigate, implement and evaluate
a search engine that uses click-through data combined with corresponding search
queries to prioritise certain fields1 in the search results. This is done in order to
change the ranking of search results and make sought documents appear further up
in the list.
The author’s hypothesis for this project is that
Hypothesis 1: the proposed solutions will be able to predict the users’ sought
documents significantly better than the unmodified search engine.
The research questions that will guide the author throughout this Master’s thesis
are:
1. Does the evaluations of the search engine heuristics indicate that the author’s
hypothesis might be accurate?
2. What are the desirable/undesirable effects of the solution?
3. Is it reasonable to assume that the proposed solution actually improve the
users’ search experience?
1.2
Contributions
This project’s contribution consists of a new way to combine different sources of
implicit feedback, namely search queries and click data, into adapting a search
engine to a user’s searching patterns.
1.3
The Project Provider
Findwise AB is a consultancy agency that focuses on, among other things, enterprise
search [6]. The office is located in the center of Stockholm. Findwise AB have, as the
role of project providers, contributed with an abundance of technical components
and immense guidance. They have encouraged the thesis workers to use practices
such as Scrum, in order to guarantee regular deliveries in a short period of time.
It is possible that the findings in this report might be used in an actual Findwise
project if they were to create custom search solutions.
1
the words fields, metadata and metadata-fields will be used interchangeably throughout this
report.
3
CHAPTER 1. INTRODUCTION
1.4
Report Outline
The rest of the document is organized as follows. Chapter 2 covers the previous
work made in the field of ranked retrieval and implicit feedback. Chapter 3 covers
the methodology and the approach taken to solve the problems at hand and how the
experiments were set up for training and testing the heuristics. Chapter 4 illustrates
and discusses the results and, finally, Chapter 5 contains the concluding remarks of
the Master’s thesis.
4
Chapter 2
Background
The following sections cover the different techniques that were used for the project
and also what previous researchers have accomplished in the field of information
retrieval.
2.1
Ranked Retrieval
Early Information Retrieval (IR) systems relied on boolean retrieval. This meant
that users would satisfy their information need by constructing complex queries
that consisted of boolean ANDs, ORs and NOTs. The drawbacks of these systems
were that it was hard for a user to construct queries when their information need
was extremely specific and there was no notion of document ranking, according to
Singhal [7]. Singhal continues by saying that there are certain power users that still
use boolean retrieval (probably librarians), but most users expect IR systems to
perform ranked retrieval.
The basis of IR systems with ranked retrieval is to sort the list of found documents on how useful they are in regards to the given query. Documents are usually
assigned numeric scores and sorted on these scores in descending order.
Different models exist to shape an IR system that is capable of ranked retrieval.
Some of these models are the vector space model and different probabilistic models.
But the one that is relevant for this Master’s thesis is the vector space model, since
the search engine used in the experiments utilised this kind of model.
2.1.1
Vector Space Model
The vector space model is, according to Manning et. al [1], fundamental to a host
of IR operations (e.g. cosine similarity). There are several aspects that comprise
the vector space model such as calculation of similarities, dot products and vectors.
5
CHAPTER 2. BACKGROUND
Vector
˛ (d) is the vector that represents document d, where there is one component for
V
each term in the document. The components are comprised of a term and its
corresponding weighting score. This score is usually dependant on factors such as
the term’s number of occurrences in the document and the number of documents
that contain this term. A very popular weighting-score is the tf -idf score (term
frequency - inverse document frequency). The dash between tf and idf should not
be mistaken for a subtraction sign; the tf-idf score is calculated by multiplying the
factors tf and idf . While tft,d represents the term frequency of a term t in document
d, idft represents the inverse document frequency for a term t or:
idft = log
N
dft
(2.1)
The reason for using inverse document frequency in the weighting score is because words are not of equal significance. The few documents that contain the term
t1 are to be given a boost when a user searches for the query t1 . Similarly, the
humongous amount of documents that contain the term t2 should not be given a
boost to their total score if a user searches for t2 .
Using (2.1), the tf-idf formula can be written as:
tf ≠ idft,d = tft,d ú log
N
dft
(2.2)
Weigel et. al shed light on two issues when tf-idf scoring is applied to structured
data: (1) which units of the data are to be treated as coherent pieces of information,
and (2) the structural similarity problem, which concerns the issue of quantifying
the distance of a document to a given query [8]. Both of these issues are somewhat
tangible to this thesis, due to the fact that the data used for the experiments consist
of structural data.
A set of documents can be seen as a set of vectors in a vector space, where there
is one axis for each term. It is important to note that this representation makes
ordering of terms completely insignificant. This means that the sentence ’Sivan is
brighter than Miran’ is the equivalent of ’Miran is brighter than Sivan’ once both
sentences are converted to vectors; semantics are not preserved.
Similarity
A standard way of calculating the similarity between two documents is to use the
cosine similarity of the vector representations of the two documents:
sim(d1 , d2 ) =
˛ (d1 ) · V
˛ (d2 )
V
.
˛ (d1 )||V
˛ (d2 )|
|V
(2.3)
The numerator represents the dot product of the vectors that represent documents d1 and d2 and the denominator is the product of the vectors’ euclidean
lengths.
6
CHAPTER 2. BACKGROUND
The cosine similarity is a central part of this Master’s thesis, as it is used to
calculate the similarity between the query and different parts of the documents’
structure. Queries can also be represented as a vector in the same manner as
the document vectors. This means that (2.3) can be used to calculate the cosine
similarity between the vector that represents document d and query q:
sim(q, d) =
˛ (q) · V
˛ (d)
V
.
˛ (q)||V
˛ (d)|
|V
(2.4)
Equation 2.4 was used by the author when a measurement of the similarity was
to be extracted between a query and one of the metadata-fields in the document.
2.2
Solr
The Solr search engine played a very central role in the project. It was, together
with its accompanying database, used for indexing data and searching through this
data in subsequent response times. Below is a description of the most important
aspect of the Solr search engine - the query fields (qf) paramater.
The qf parameter enabled the author to search for a query among a certain set
of metadata-fields. In addition to this, the qf parameter also allowed the author to
weight the fields differently. This means that the parameter can be used to make a
query match in one field more significant than a query match in another field; fields’
importance can be differently weighted and this was exactly the kind of behaviour
that was pursued during the course of the project.
The qf-string has the following structure:
1
2
n
qf = f ieldvalue
+ f ieldvalue
+ ... + f ieldvalue
n
1
2
(2.5)
The + in Equation 2.5 should not be confused with the addition operator. This
is merely a way for Solr to append additional data to a parameter.
The official term for what happens in Equation 2.5 is referred to as boosting.
As stated previously, this means that more emphasis is put on boosted fields of the
documents [9]. In rough terms, the implicit feedback that is acquired from a user is
meant to be used to update the values that will be used in the qf-parameter; user
behaviour will dictate how fields’ importance are weighted.
2.3
Implicit Feedback
The majority of this section contains descriptions of different implementations of
implicit feedback (IF). Apart from being used in different Recommender Systems
(RS) and video retrieval systems, IF is almost exclusively used for document retrieval systems and the mentioned applications will mostly touch upon this field of
study.
7
CHAPTER 2. BACKGROUND
2.3.1
Evaluating Differences
While there are many proceedings and reports discussing IF, most of them have
spent much time evaluating the applications that use IF. This means that the benefits of IF are not given and authors take great caution when hypothesising positive
correlations between document usability and the use of IF. Some articles in the bibliography contain a section dedicated to determining any correlation between the
use of IF and search engine improvement or other related metrics.
This is highly present in a report written by Kelly and Belkin [10]. They hypothesized that scrolling frequency, reading time and document interaction were great
sources of IF and would help users find documents they thought were relevant. Statistical methods showed that there was no statistically significant difference between
relevant and non-relevant documents when using any of the three aforementioned
sources.
2.3.2
Drawbacks of Implicit Feedback
Joachims and Radlinski [2] claimed that IF is used in most search engines and IF is
important if search engines are to advance beyond the one-ranking-fits-all approach,
but IF is also noisy and biased, making simple learning strategies doomed to fail.
The same authors mentioned that IF has a problem with trust-bias - users rely
heavily on the ranking given by the search engine. Joachims et al. [11] formed
several strategies that incorporate IF and conclude that even the best strategies
were less consistent than explicit feedback results, but they mentioned that the
virtually infinite amount of implicit feedback combined with good machine learning
algorithms should suffice in closing the performance gap between the two types of
feedbacks.
IF cannot be negative and is interpreted on a relative scale [12]. The abundance
of IF has the downside that it is highly noisy [13].
2.3.3
Different applications
As mentioned in the beginning of the section, IF can also be used in Recommender
Systems (RS). The use of RS is not uncommon for services like Netflix and Amazon
[14]. Jawaheer et al. [12] used both explicit and implicit feedback to create an RS
in a music context. They mentioned that users are usually reluctant to give explicit
feedback because of the cognitive effort it requires. In this context, IF is considered
to be the number of times a person has listened to a certain song and they devised
a method that combines implicit and explicit feedback to calculate ratios of interest
and these ratios decide what artist should be recommended to the user.
It is important to note that unlike explicit feedback, IF only gives a relative
scale in this case [12]. While a user explicitly down- or upvotes a song to express
that he/she dislikes or likes the song, IF can only be interpreted in relative terms.
A user listening to a song 10 times does not give much information, but if the same
8
CHAPTER 2. BACKGROUND
user listened to another song 100 times, one could conclude that the latter song is
preferred over the former one. This is an example of IF.
Hopfgartner and Jose [15] evaluated implicit feedback models for their video
retrieval engine. Through defining different user behaviour models, they knew what
kind of IF-data to expect from users if they were to deploy the search engine and
include it in user studies, but the user behaviour models were created to form a
simulator that was to access data according to the different models.
Another popular application of IF is in the form Collaborative Filtering (CF),
which is letting a majority of users’ preferences dictate what the common user likes
and dislikes. The drawback of CF lies in something called the cold start problem
[13]; in the beginning, there is relatively little data about each user, which results
in an inability to draw inferences to recommend items to users [16].
Zigoris and Zhang [17] used bayesian models together with explicit and implicit
feedback to solve the aforementioned cold start problem. The authors claim that
the field of IR is moving into a more complex environment with user-centered or
personalized adaptive information retrieval. In unanimity with the other authors,
Zigoris and Zhang wrote that the user does not like to provide explicit data, which is
why IF receives much attention; user data can be gathered without users explicitly
telling the search engine about their informational need.
Kim et al. [18] mentioned that explicit feedback is not practical, since users are
unwilling to give such feedback. Just as Kelly and Belkin, Kim et al. believed that
reading time will not help the authors in finding what documents users find relevant.
Even though this is Kim’s et al. initial beliefs, they actually find a significant
difference in reading time between non-relevant and relevant documents [18]. The
author of this Master’s thesis believes that the difference in results is because of
the fact that the test subjects in Kim’s report have been given a document set
that they are familiar with. This was not the case with the test subjects in Kelly
and Belkin’s report [10] and these authors concluded that the lack of statistically
significant difference was mainly because the test subjects had a hard time finding
articles that were relevant because they were not experts in the field covered by the
document set.
Although the most common disadvantage of IF is the fact that it is so noisy, it
comes with some great advantages as well. Yuanhua et al. [19] discussed the state
of search engines and write that they are not good for personalized searches. IF is
used to remedy this problem and they argued that the advantage of IF is that users
constantly provide search engines with IF.
There are many factors of user behaviour that can be used as a source of IF.
One example is eye movement. Buscher et al. [20] used eye tracking data to analyse
how users read documents, what parts were read and how these parts were read.
The data was then used for personalization of future search results.
Another factor that can be used as IF is text selection. White and Buscher
[21] analysed the text that users highlight and compare this highlighted text to the
search query. The calculated similarities were then to be used for the same purpose
as stated in the previous article - information retrieval personalization.
9
CHAPTER 2. BACKGROUND
2.3.4
User Behaviour Models
Agichtein et al. [3] incorporated user behaviour data into the ranking process and
showed that the accuracy of a modern search engine can be significantly improved.
Implicit feedback was modeled into several features like click-through features,
browsing features, time on page and query-text features. Their tests showed that
their ranking algorithms significantly improved their accuracies when incorporating
their proposed implicit feedback model.
Shen et al. [4] identified a major deficiency in existing retrieval systems - they
lack user modelling and are not adaptive to individual users. One example that was
mentioned was that different users might have the same query (e.g. ”Java”) to search
for different information (e.g. the Java island in Indonesia or the programming
language). They also mentioned that the information need of a user might change
over time. The same user that queried ”Java” might have been referring to the
island the first time around, but was later referring to the programming language.
Most common IF models take advantage of the query to create a user behaviour
model, but since queries are usually short, the model becomes very impoverished
[4]. The purpose of Shen’s et. al paper was to create and update a user model
based on implicit feedback and use the model to improve the accuracy of ad hoc
retrieval, which is short term information needs. The implicit user models created
by the authors remind a lot of the strategies that Joachims et al. [11] composed.
Among many things, Shen’s model [4] accounted for viewed search results that were
ignored by the user and re-ranked the search results that were yet to be presented
to the user. Evaluation of search results for their user behaviour model was done
by calculating precision and recall for documents at certain listings. Both precision
and recall were better than Google’s search engine for the top documents in the
case of 32 query topics.
Liu et al. [22] created a user behaviour model based on the users dwell time on
a webpage. By extracting multiple data points of dwell time coupled with how the
user reacted, it was possible for the authors to create a bell-curve distributions that
could predict the most probable action of the user.
One of the oldest articles in the literature study is Oard and Kim’s [5] article
on using implicit feedback in recommender systems. Even in this article, a user
behaviour model was proposed to abstract the informational need of the user and
how it would implicitly help the system.
10
Chapter 3
Methodology
This chapter covers the different methods that were used in order to solve the
problem at hand. Subjects such as database configurations, heuristics and tests are
brought up in this chapter.
One of the main challenges of this Master’s thesis was to model and implement
the heuristics that were used for ranking document results. When this was accomplished, the author conducted the proposed evaluation to train and test the ranking
algorithm with the designed heuristics.
3.1
Heuristics
Two different heuristics were implemented and below are the full descriptions of
both heuristics.
3.1.1
Field value boost
This heuristic is the fundamental value-boost heuristic that is the cornerstone of
the Master’s thesis. It works by keeping track of decimal values that work as boostvalues for the aforementioned Solr qf-parameter. All fields have a decimal value of
1.0 in the beginning, but this changes depending on what kind of metadata the user
prefers. If the user were to click on links where the title of these links are highly
similar to the user’s query, the title-field is associated with a higher decimal value
and this leads to future search results where the title of the top results are highly
similar to the query. Below is the formula for calculating the new value for a field,
when it has been clicked:
best f ield := choose most similar f ield(query, chosen document.f ields) (3.1)
best f ield.value
best f ield.value :=
(3.2)
sum of score on all f ields + best f ield.value
11
CHAPTER 3. METHODOLOGY
Note that best f ield.value on the right-hand side of the second equation refers
to the current decimal value of best f ield. This current value is overwritten after
the second equation is executed.
Furthermore, line 3.1 can be written in more detail, since this line is meant
to calculate what part of the document is the most similar when compared to the
user’s search query:
Data: chosen document, user query
Result: best field
best field := NIL;
best score := NIL;
for each current field in chosen document do
score := cosine similarity(current field, user query);
if score >best score then
best score := score;
best field := current field;
end
end
return best field;
Algorithm 1: Choosing the most similar field
3.1.2
Field value boost with dampening effect
This heuristic is fundamentally the same as the previous one, but it has another
feature to it. This feature is that the value that is to be added to the old field
value has a dampening constant. This dampening constant introduces a behaviour
where several updates on the same field are not as aggressive as in the previous
heuristic. Below is the formula for applying the dampening effect to the value of
the best f ield1 .
best f ield.value :=
best f ield.value
ú dampening
sum of score on all f ields + best f ield.val
Where dampening can be written as:
dampening = 1/(f ield.number of times updated)
The expression above implies that increments of a field’s value will diminish when
the field has been updated many times. Without the dampening, incrementing the
value of a field that has not been of much interest, until now, is not that significant.
This is because the formulas take the value of a field and divide it with the sum of
all fields’ values. This fraction becomes small when the total sum (the denominator)
is large and the current value of the selected field (the numerator) is small because
1
best f ield is assigned in the same way as in Equation 3.1
12
CHAPTER 3. METHODOLOGY
of stated reasons. The dampening tries to remedy the problem of aggressively
increasing values and the following chapter shows if the dampening introduces any
significantly different behaviours.
3.2
Gathering Data
Just as pointed out by the literature study, it is difficult to simulate users’ information need [10]. Therefore, it is beneficial for the quality of data if it is possible
to procure data where genuine information need is guaranteed. This kind of data
was given by the INEX forum in the form of a qrel file accompanied with a file that
contains numerous topics and queries.
Figure 3.1. A snippet from the qrel file
The structure of the qrel file is easy to follow and is illustrated in Figure 3.1.
The first column represents the IDs of a topic. The topics are extracted from a topic
file and an example topic is illustrated in Figure 3.3. The only piece of information
in the topic file that is relevant to this Master’s thesis is the query found inside
each topic, which is found with the help of the topic id. These queries are actual
queries written by actual users of the LibraryThing service2 (LT). The ’Q0’ string
was disregarded for the experiments.
The next relevant column of data represents the books’ IDs in the LT system.
To be able to make use of these book IDs, one had to first convert them to Amazon
ISBNs, which was possible since the author was given an LT-Amazon translation
table. A snippet of the translation table is illustrated in Figure 3.2.
The last piece of information is listed after the book id. This column represents
the relevance scores for the books. The book with the highest score for a certain
topic is considered to be the most relevant book in regards to the query that the
topic represents.
2
www.librarything.com. A website where users can create and explore personal libraries.
13
CHAPTER 3. METHODOLOGY
Figure 3.2. A snippet from the translation table between ISBNs and LT ids, respectively. Note that one LT id is usually mapped to several ISBNs.
Figure 3.3. A snippet from the topic file
The qrel file was used as a list of queries. These queries were used as input to
the search engine and also as a way to find out what books were to be inserted in
the database. The qrel file also played an integral part during the evaluation.
Another aspect of data gathering is the structure of the books. The set of
books given by the INEX forum consisted of 2.8 million books with metadata from
Amazon and LibraryThing. The metadata from Amazon was formal and contains
information about title, author, publisher and so forth. The metadata from LibraryThing was user-contributed and usually contained information about awards,
book characters and places [23].
14
CHAPTER 3. METHODOLOGY
Figure 3.4. A snippet from a file that is formatted according to the Solr schema
Figure 3.4 depicts a snippet of a book that has been formatted in a way that
made it compatible for upload to the search engine’s database. Certain numeric
and trivial fields were omitted from all of the books. Figure 3.4 is just a simple
example of how the fundamental structure of a formatted document looked like.
3.3
Evaluation
The evaluation was essential for this Master’s thesis. The evaluation was used to
find out if the heuristics significantly affect the behaviour of the search engine. The
following paragraphs should contain enough information for the reader to be able
to reproduce the stages of the evaluation.
Just below this paragraph, the reader can observe the pseudo-code that describes
the procedures of the evaluation. As can be seen, not only was the evaluation used
to find the benefits of the heuristics, but it was also used to actually train the
heuristic by simulating the behaviour of a user.
15
CHAPTER 3. METHODOLOGY
for n := 1 to 100 do
shuffle set of queries;
for each q in queries do
input q in search engine;
couple search results with their relevance scores from qrel file;
calculate [email protected];
”click” on search result with highest maximum of relevance;
update field values with regards to q and ”clicked” link;
end
extract mean-relevance for latest query;
end
Algorithm 2: Evaluation algorithm
A thorough description of every part in Algorithm 2 is given below, along with
some omitted parts such as plotting the results.
Shuffling the query set The reason to why the query set shuffled was because
the author needed to exclude the risk of results being dependent on the queries’
order in the set.
Using relevance score from qrel file Once a query was used as input to the
search engine, search results were acquired. The qrel file played a very important
role in this stage; the documents that were considered relevant to the given search
query were given relevance scores. These relevance scores were then coupled with
the search results. For instance, if one searched for the query ”bible” and also had
access to a qrel file where it explicitly said that the book with ISBN ”1585161519”
has a relevance score of 10 in relation to this query, this score would be transferred
from the qrel file and coupled with the corresponding search result3 . An illustration,
that is unrelated to aforementioned example, of how scores were used can be seen
below. Note that scores in Figure 3.5 are not actually visible for the end-user, but
they are visible in this picture for the purpose of clarification.
3
For this example, assume that 10 equates to ”highly relevant”
16
CHAPTER 3. METHODOLOGY
Figure 3.5. Picture showing the search GUI in action. Also illustrates the example
of how scores are used and what link the user clicked on.
User simulation One of the crucial parts of the evaluation was that it was also
responsible for simulating the behaviour of a user in order to choose a document
among the search results. The choice of a document was considered a click and was
then used, together with the query, to let the heuristic update the field values. The
behaviour of the simulated user was based on the following hypothesis:
Hypothesis 2: the user will click on the document that has the highest maximum of relevance score among the top 10 search results.
If one were to apply this hypothesis to Figure 3.5, it would mean that the user
would click on the second search result. The reason for this choice is because the
chosen document is among the top 10 search results, the score of 4 is the maximum
of the available relevance scores, and the score of 4 has the highest position among
all scores of 4.
Measure of performance In order to get a perception of how well the heuristic
was performing for each query, a measure of performance needed to be proposed.
The one that the author proposed is called [email protected] ([email protected]). The
idea behind this measurement is mostly explained by its name:
M [email protected] =
q10
i=1 documenti .relevance
.
(3.3)
10
In other words, [email protected] is simply the mean relevance for the first 10 search
results. The relevance that is used in Equation 3.3 is referring to the relevance
17
CHAPTER 3. METHODOLOGY
that was extracted from the qrel file. If a relevance score is not available for a
certain search result, its relevance score is simply set to 0.
Plotting the results For the reader to make sense of the evaluation results, the
data was visualized into a number of charts.
For every shuffle of the query set, [email protected] was calculated for every query that
was handed to the search engine. This allows the author to plot the [email protected]
for every available query, enabling the reader to see how the [email protected] varied
after every field value update for all of the heuristics.
Aforementioned plotting procedure does not lead to the final result, however.
The interesting data was created due to the shuffles of the query set. At the end
of running all of the queries, the latest calculated [email protected] was stored as a
unique value for that specific query order. This means that after 100 shuffles were
performed, the author would have acquired 100 unique [email protected] These scores
were plotted as bell curve plots and they were constructed for all of the heuristics.
Using these bell curve plots and their accompanying data, measurements such
as standard deviation was calculated. Finally, the author performed a paired t-test
to find out if the heuristic lead to a behaviour that was significantly different from
the regular baseline search engine.
3.4
Technical Configurations
The following text goes through the technical components in the system and reveals
what they were used for and how they were configured.
3.4.1
Search Engine
The search engine used for this Master’s thesis is called Apache Solr4 . Solr is one of
the products that resulted from the Apache Lucene project and offers features like
full-text search, hit highlighting and so forth. Full-text search is the most relevant
feature for the Master’s thesis.
The differences in configuration between the default Solr product and the one
that was used for the Master’s thesis are not many. The major, and important,
differences lie in the schema file. The major changes are what fields that are to be
considered. The documents that were added to the search engine’s database needed
to mostly comply with the listed fields in the schema. A snippet of the schema’s
containing fields can be found below.
4
http://lucene.apache.org/solr/
18
CHAPTER 3. METHODOLOGY
Figure 3.6. A snippet of Solr’s schema configuration.
Figure 3.6 shows a few of the fields that can exist in the documents that were
submitted to the search engine’s database. A closer inspection of the figure reveals
that the field ”isbn” was the only field that was actually required. This means that
a document was disregarded if its isbn field did not exist.
The value of the ”indexed” variable told the user if the field was supposed
to be searchable. Most fields had this variable set to ”true”. Certain numeric
values like ”weight” were not searchable, since it seemed highly unlikely that a
user would search for books by solely entering their weight. Although some fields
are not searchable, they were most likely ”stored” and the corresponding variable
was set to ”true”. This means that the value of the field was stored and could be
presented if the document was found with the help of a field that was indexed. The
”multiValued” field was very useful when a field could contain several values. An
example of this was the field that contains the author(s). If one has several authors
to a book, it was convenient if the field could support storage of several values.
3.4.2
Jellyfish Service
Jellyfish is a Findwise product that acts as the service layer between the data and
front-end layer. Because of Jellyfish, the business logic is abstracted, which leads to
a case where the front-end design does not need to care for the business logic and
can instead focus on rendering and handling user interactions 5 .
The Jellyfish project was imported to the author’s workspace and was configured from there. A template came with the project and minimal configuration was
required. The substantial changes were made in an xml-file that was related to how
the Solr instance would work. Jellyfish encapsulates Solr and was responsible for
creating the queries that take boost values into consideration.
5
http://www.findwise.com/services/glossary/jellyfish-component. A brief summary of Jellyfish
19
CHAPTER 3. METHODOLOGY
Figure 3.7. A snippet of the JellyFish configuration that changes the Solr query.
The bean in Figure 3.7 that is called ”qf” appended a qf-string to the Solr
query. Its input data came from a file called data.properties, but the reader should
be informed that this data was overwritten by a Java method that loads the proper
qf-values from the author’s own database. The unnecessary input of aforementioned
file was due to technical issues. In the end, JellyFish appended this line to the URL
and passed it to Solr.
3.4.3
Searching interface
The reader was given a brief glimpse of the search interface in Figure 3.5. The
purpose of the Jellyfish component was to take care of the system’s business logic
in order to make the front-end as light weight as possible. The search interface was
extremely light weight and users would interact with, at most, three components:
the search field, the view where search results are presented, and the buttons that
are used to navigate to the next page. Once the ”Hitta” button was clicked, the
query that was given by the end-user was submitted to the Jellyfish component,
which added the parameters specified in the Solr configuration files i.e. qf.
3.4.4
Database
Field values The database for the Master’s thesis was a simple H2 Database that
was used as a key-value store for the values used in the qf-parameter. Every time a
heuristic updated a field value, it uploaded the value to the appropriate table. How
values were stored is illustrated in Figure 3.8
20
CHAPTER 3. METHODOLOGY
Figure 3.8. Illustration showing some of the fields and how they are stored in the
database.
Click- and Search-logging Another reason for using a database was to log the
clicks and searches made by a user. The database also offered a table that revealed
which one of the clicks was related to a certain query. This data was used to find
out what a user clicked on in response to his/her query and this tuple of data was
used as input data to one of the heuristics.
21
Chapter 4
Results
The results shown in this chapter are different illustrations of the data acquired
through the evaluation. The first section will illustrate plots that show how the
mean-relevance improved after every query and also the plots that illustrate the
normal probability distribution of mean-relevances for the heuristics. The final
section is dedicated to an analysis of the results. The analysis will be in the form of
a paired two-tailed t-test with a high significance level and is used to see if there are
any significant differences between the baseline search algorithm and the heuristics
created by the author.
4.1
Plots
Below is a group of plots that show how the mean-relevance differed after every
query search for all of the heuristics and the baseline algorithm.
Note that the set of queries was shuffled 100 times for each heuristic. Therefore,
each of the figures shown in Figure 4.1 - 4.3 were randomly chosen from three
different sets of 100 different plots. They are essentially snapshots of large test runs
that were run to create the distributions seen in Figure 4.4 - 4.6.
These distributions attempted to illustrate the normal probability distribution
for each heuristic. The construction of these plots was previously explained and
readers are referred to Section 3.3 to find out how the construction of the plots was
performed.
22
CHAPTER 4. RESULTS
Figure 4.1. Plot showing how the mean-relevance differed after every searched query
for the Value boost heuristic that uses the dampening effect.
23
CHAPTER 4. RESULTS
Figure 4.2. Plot showing how the mean-relevance differed after every searched query
for the Value boost heuristic without the dampening effect.
24
CHAPTER 4. RESULTS
Figure 4.3. Plot showing how the mean-relevance differed after every searched query
for the Baseline algorithm. Field values are not boosted, which is the equivalent of a
regular Solr search being executed.
25
CHAPTER 4. RESULTS
Figure 4.4. The normal probability distribution for the mean-relevances when using
the valueboost heuristic with dampening.
26
CHAPTER 4. RESULTS
Figure 4.5. The normal probability distribution for the mean-relevances when using
the valueboost heuristic without dampening.
27
CHAPTER 4. RESULTS
Figure 4.6. The normal probability distribution for the mean-relevances when using
the baseline algorithm.
28
CHAPTER 4. RESULTS
Figure 4.7. The normal probability distribution for the mean-relevances of all the
previously illustrated plots in the same figure.
By looking at Figure 4.7, the reader should be able to observe that the differences
between the heuristics and the baseline algorithm are not large. Observing the noncorrelation with the naked eye is not enough. The non-correlation will be further
investigated and discussed in Section 4.2.
4.2
Analysis
Observing Figures 4.1 - 4.3 one cannot see any differences in the evolution of the
mean-relevance. The behaviour seems erratic for all of the heuristics. Although this
data is omitted, the author can mention that most search queries usually returned
well over 90 % of the stored document set as search results. These large sets of
search results might have lead to essential documents being pushed down beyond
the top-10 ranking. If the highly relevant documents were placed somewhere beyond
the first page several times, it means that the actual heuristics failed to make the
mean-relevance converge to a high value. This observation is further touched upon
in the concluding remarks.
29
CHAPTER 4. RESULTS
In order to see how values are distributed, the probability distribution curves
are shown in Figures 4.4 - 4.7. The most probable value of the mean-relevance
is 0.6 for all of the heuristics and the similarity of their distribution is perfectly
seen in Figure 4.7. The similarity can also be observed by comparing the means
and standard deviations for all of the distribution curves, since these values are also
similar. To ensure that there is no significant difference between the distributions of
mean-relevances, two-tailed paired t-tests were performed. Below are the different
cases that were tested:
Test case 1 Significance between the value boost heuristic without dampening
and the baseline algorithm.
Test case 2 Significance between the value boost heuristic with dampening and
the baseline algorithm.
When performing a paired two-tailed t-test, one must specify a null hypothesis
and an alternative hypothesis before performing the actual test. The null hypothesis states that there is no difference in the means of the sample sets, while the
alternative hypothesis states the contrary. The null and alternative hypotheses are
mutually exclusive.
The following subsections shows the reader the null and alternative hypotheses,
and the outcomes of the tests. The chosen significance level for both test cases was
0.05 (5 %).
4.2.1
Test case 1
Null hypothesis There is no difference between the sample means between samples from the value boost heuristic’s mean-relevances and the corresponding samples
from the baseline algorithm.
Alternative hypothesis There is a difference between the sample means due to
a non-random cause.
Results The two-tailed paired t-test showed that there was no significant difference between the sample means. The alternative hypothesis must be rejected.
4.2.2
Test case 2
Null hypothesis There is no difference between the sample means between samples from the value boost dampening heuristic’s mean-relevances and the corresponding samples from the baseline algorithm.
Alternative hypothesis There is a difference between the sample means due to
a non-random cause.
30
CHAPTER 4. RESULTS
Results The two-tailed paired t-test showed that there was no significant difference between the sample means. The alternative hypothesis must be rejected.
This test confirms, with high certainty, that there were no significant differences
in precision between the heuristics and the baseline algorithm. One of the reasons
could have been the fact that the heuristics can be considered as somewhat simple
algorithms with no actual scientific support for how boost-values are incremented.
Just as Joachims and Radlinksi [2] mentioned, simple heuristics are doomed to fail.
31
Chapter 5
Conclusions
This is the final chapter of the Master’s thesis and aims to answer the research
questions that were formulated in the beginning of this report. Once the research
questions have been answered, future improvements on this Master’s thesis will be
brought up.
5.1
Research questions
Does the evaluations of the search engine heuristics indicate that the author’s hypothesis might be accurate? Hypothesis 1 stated that the proposed
solutions would be significantly better at presenting the users’ sought documents
than the baseline search engine1 . Statistical tests were made that used data acquired
through an evaluation process. These tests showed, with a significance level of 5
%, that there was no difference between the mean-relevances given by the proposed
solutions and the baseline search engine. Any differences that might have been
observed were simply too insignificant and probably the cause of random factors.
What are the desirable/undesirable effects of the solution? Looking at
Figure 4.7, the reader could see that it was most likely that the mean-relevance for
an executed search would have a value of 0.6. This applied for all of the proposed
solutions and the baseline search engine. A value of 0.6 equates to a scenario where
some of the documents on the first page are not relevant to the search query. In
other words, any value that is well below 1.0 is considered to be extremely poor.
This means that one of the undesirable effects of the proposed solutions was that
they do not change the behaviour of favouring documents that are non-relevant to
the search query.
Another undesirable effect was the amount of returned documents. A search
query containing the word ”books” would, in this case, return more than 90 % of
1
The hypothesis actually mentioned ”the unmodified search engine”, but this is the same as
”the baseline search engine”.
32
CHAPTER 5. CONCLUSIONS
the indexed documents and it probably pushed back the relevant documents away
from the top-10 list.
Looking at the plots of mean-relevance in Figures 4.1 - 4.3, another undesirable
effect that can be observed was that, for all cases, there was no convergence of meanrelevance. The evolution of mean-relevance looks highly random and the author
believes that the reason for this lies in the fact that the proposed solutions were
too simple. Although actual users were not used for the evaluation, the behaviour
of the simulated user still gave rise to the infamous trust bias, since the simulated
user only clicked a document that was placed somewhere on the first page.
After the execution of all queries, only around half of the metadata fields were actually updated. Fields like ”isbn”, ”publisher”, ”dewey” and so forth were never updated and retained their original value after the evaluations. This introduces doubt
over the necessity of having to index all of the metadata-fields into the database.
Is it reasonable to assume that the proposed solutions actually improve
the users’ search experience? The evaluation in this Master’s thesis has shown,
with great accuracy, that there were no significant differences in accuracy between
the proposed solutions and the baseline search engine. Therefore, the proposed solutions do not improve the users’ search experience and there is no point in replacing
the baseline search engine in favour of the proposed solutions.
5.2
Future improvements
In this final section, all of the future improvements for this Master’s thesis are listed
and explained.
Calculating similarity For every pair of executed search query and click, similarities between the metadata-fields of the clicked document and the search query
were calculated. The metric used to calculate similarity was the common cosine
similarity. One way of improving the Master’s thesis could be to introduce different
ways of calculating similarities between, essentially, two strings.
Another aspect of similarity, that can be changed, is to not just look at what
metadata-fields are similar to the query, but also look at how much of a similarity
was found. The current setup only allowed one metadata-field to have its boostvalue updated, the most similar one, but it is unfair to completely disregard the
field that was the second most similar field or even the third most similar field.
Therefore, the code used to update the boost-values needs to take into account that
several fields might have affected the user’s decision of choosing a document.
Another aspect that can be changed is the increment rate of the fields. If there
is a scenario where only one field was found to be similar to the search query, one
could introduce a function that reduces or amplifies the increment value based on
how much of a similarity could be found between the metadata-field and the search
query.
33
CHAPTER 5. CONCLUSIONS
User-centric evaluations The 380 queries that the author acquired from INEX
and used for the evaluations came from 380 different LibraryThing users. The
evaluation was carried out by simulating the behaviour of a single user, but since the
380 queries came from several users, the evaluation was actually trying to attempt
the behaviour of a typical user in the given domain - LibraryThing. In other words,
the evaluation attempted to create a search behaviour model for the entire domain
of LibraryThing. This did not work well and the proposed solutions did not differ
in behaviour compared to the baseline search engine.
Another way of evaluating the heuristics could be to make the evaluations usercentric; the evaluation should try to create individual user behaviour models in
order to see if the proposed solutions are good at creating a model of information
need for a single user.
Not indexing all of the metadata In the light of looking at the values of the
metadata-fields, some of the fields are never updated during the evaluation and their
indexing into the search engine’s database might have been redundant. One could
research this aspect by only indexing the metadata that is likely to be searched by
end-users. This might reduce the amount of documents returned and make it more
likely that the relevant documents will be closer to the top-10 list of search results.
Taking other factors of implicit feedback into account The heuristics that
were evaluated in this Master’s thesis only took the search query and the clicked
search result into consideration. Since it has been mentioned that simple heuristics
are doomed to fail [2], it would be interesting to evaluate heuristics that take several
factors of IF into account. One example is the usage of the reading time of a
document. If a user stays on the page of a document, it could mean that the
document is of interest for the user.
Different authors in the literature review put this theory to the test and came
to different conclusions [10, 18]. The findings that claim that reading time is not
a good metric for relevance of a document claim that the conclusion was affected
by the fact that test subjects were not experts in the domain where they searched
for documents. This is not the case for [18], where users were experts of the search
domain and had no issues in finding documents that were relevant to the queries
they were to search for. This lead to reading time being used as a metric that would
significantly increase the performance of the search engine.
In the light of this finding, future researchers should attempt to extend the
proposed solutions of this Master’s thesis to incorporate the factor of reading time.
Different hypothesis for user behaviour Hypothesis 2 concerned the behaviour of a user when he/she is to click on a search result. Not enough data
was gathered in order to deduce if Hypothesis 2 actually increased the quality of
the evaluation results. An improvement to this study could involve tests where
different user behaviours are used for the evaluation.
34
References
[1] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. An
Introduction to Information Retrieval. Cambridge University Press, Cambridge, England, 2009.
URL http://nlp.stanford.edu/IR-book/pdf/
irbookonlinereading.pdf [Online; retrieved January 23rd 2014].
[2] Thorsten Joachims and Filip Radlinski. Search engines that learn from
implicit feedback.
IEEE Computer, 40(8):34–40, August 2007.
URL
http://luci.ics.uci.edu/websiteContent/weAreLuci/biographies/
faculty/djp3/LocalCopy/04292009.pdf [Online; retrieved February 3rd
2014].
[3] Eugene Agichtein, Eric Brill, and Susan Dumais. Improving web search ranking
by incorporating user behavior information. In Proceedings of the 29th Annual
ACM International Conference on Research and Development in Information
Retrieval (SIGIR ’06), 2006. URL http://www.msr-waypoint.com/en-us/
um/people/sdumais/SIGIR2006-fp345-Ranking-agichtein.pdf [Online; retrieved January 29th 2014].
[4] Xuehua Shen, Bin Tan, and ChengXiang Zhai. Implicit user modeling for
personalized search. In Proceedings of the 14th ACM International Conference
on Information and Knowledge Management, CIKM ’05, pages 824–831, New
York, NY, USA, 2005. ACM. URL http://doi.acm.org/10.1145/1099554.
1099747 [Online; retrieved February 3rd 2014].
[5] Douglas W. Oard and Jinmook Kim. Implicit feedback for recommender
systems.
In AAAI Technical Report WS-98-08, 1998.
URL http://
www.aaai.org/Papers/Workshops/1998/WS-98-08/WS98-08-021.pdf [Online; retrieved January 29th 2014].
[6] Findwise AB. The website of findwise ab, May 2014. URL http://www.
findwise.com [Online; retrieved May 1st 2014].
[7] Amit Singhal. Modern information retrieval: A brief overview. IEEE Data Eng.
Bull., 24(4):35–43, 2001. URL http://singhal.info/ieee2001.pdf [Online;
retrieved July 15th 2014].
35
REFERENCES
[8] Felix Weigel, Klaus U. Schulz, and Holger Meuss. Ranked retrieval of structured documents with the s-term vector space model. In Norbert Fuhr, Mounia
Lalmas, Saadia Malik, and Zoltán Szlávik, editors, Advances in XML Information Retrieval, volume 3493 of Lecture Notes in Computer Science, pages 238–
252. Springer Berlin Heidelberg, 2005. URL http://dx.doi.org/10.1007/
11424550_19 [Online; retrieved July 15th 2014].
[9] Apache. Queryfields function in solr, May 2014. URL http://wiki.apache.
org/solr/ExtendedDisMax#bf_.28Boost_Function.2C_additive.29 [Online; retrieved May 5th 2014].
[10] Diane Kelly and Nicholas J. Belkin. Reading time, scrolling and interaction:
Exploring implicit sources of user preference for relevance feedback. In Proceedings of the 24th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR ’01), 2001. URL http://comminfo.
rutgers.edu/etc/mongrel/kelly-belkin-SIGIR2001.pdf [Online; retrieved
January 23rd 2014].
[11] Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri
Gay. Accurately interpreting clickthrough data as implicit feedback. In
Proceedings of the Conference on Research and Development in Information
Retrieval (SIGIR), 2005. URL http://www.cs.cornell.edu/people/tj/
publications/joachims_etal_05a.pdf [Online; retrieved January 23rd].
[12] Gawesh Jawaheer, Martin Szomszor, and Patty Kostkova. Comparison of implicit and explicit feedback from an online music recommendation service. In
Proceedings of the 1st International Workshop on Information Heterogeneity
and Fusion in Recommender Systems, HetRec ’10, pages 47–51, New York, NY,
USA, 2010. ACM. URL http://doi.acm.org/10.1145/1869446.1869453
[Online; retrieved February 3rd 2014].
[13] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM ’08, pages 263–272, Washington,
DC, USA, 2008. IEEE Computer Society. URL http://dx.doi.org/10.1109/
ICDM.2008.22 [Online; retrieved February 3rd 2014].
[14] Steffen Rendle and Christoph Freudenthaler. Improving pairwise learning for
item recommendation from implicit feedback. In Proceedings of the 7th ACM
International Conference on Web Search and Data Mining, WSDM ’14, pages
273–282, New York, NY, USA, 2014. ACM. URL http://doi.acm.org/10.
1145/2556195.2556248 [Online; retrieved July 15th 2014].
[15] Frank Hopfgartner and Joemon Jose. Evaluating the implicit feedback models
for adaptive video retrieval. In Proceedings of the International Workshop on
Workshop on Multimedia Information Retrieval, MIR ’07, pages 323–331, New
36
REFERENCES
York, NY, USA, 2007. ACM. URL http://doi.acm.org/10.1145/1290082.
1290127 [Online; retrieved February 3rd 2014].
[16] Shaghayegh Sahebi and William Cohen. Community-based recommendations:
a solution to the cold start problem. In Workshop on Recommender Systems
and the Social Web, RSWEB, 2011. URL http://d-scholarship.pitt.edu/
13328/ [Online; retrieved February 3rd 2014].
[17] Philip Zigoris and Yi Zhang. Bayesian adaptive user profiling with explicit &
implicit feedback. In Proceedings of the 15th ACM International Conference
on Information and Knowledge Management, CIKM ’06, pages 397–404, New
York, NY, USA, 2006. ACM. URL http://doi.acm.org/10.1145/1183614.
1183672 [Online; retrieved February 3rd 2014].
[18] Jinmook Kim, Douglas W. Oard, and Kathleen Romanik. Using Implicit Feedback for User Modeling in Internet and Intranet Searching. College of Library
and Information Services, University of Maryland, College Park, 2000. URL
http://books.google.se/books?id=kgdFGwAACAAJ [Online; retrieved February 3rd 2014].
[19] Yuanhua Lv, Le Sun, Junlin Zhang, Jian-Yun Nie, Wan Chen, and Wei Zhang.
An iterative implicit feedback approach to personalized search. In Proceedings of the 21st International Conference on Computational Linguistics and
the 44th Annual Meeting of the Association for Computational Linguistics,
ACL-44, pages 585–592, Stroudsburg, PA, USA, 2006. Association for Computational Linguistics. URL http://dx.doi.org/10.3115/1220175.1220249
[Online; retrieved February 3rd 2014].
[20] Georg Buscher, Andreas Dengel, Ralf Biedert, and Ludger V. Elst. Attentive
documents: Eye tracking as implicit feedback for information retrieval and
beyond. ACM Trans. Interact. Intell. Syst., 1(2):9:1–9:30, January 2012. URL
http://doi.acm.org/10.1145/2070719.2070722 [Online; retrieved July 15th
2014].
[21] Ryen W. White and Georg Buscher. Text selections as implicit relevance feedback. In Proceedings of the 35th International ACM SIGIR Conference on
Research and Development in Information Retrieval, SIGIR ’12, pages 1151–
1152, New York, NY, USA, 2012. ACM. URL http://doi.acm.org/10.1145/
2348283.2348514 [Online; retrieved July 15th 2014].
[22] Chao Liu, Ryen W. White, and Susan Dumais. Understanding web browsing
behaviors through weibull analysis of dwell time. In Proceedings of the 33rd
International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, pages 379–386, New York, NY, USA, 2010.
ACM. URL http://doi.acm.org/10.1145/1835449.1835513 [Online; retrieved July 15th 2014].
37
REFERENCES
[23] INEX Forum. Information about the data set, May 2014. URL https://inex.
mmci.uni-saarland.de/tracks/books/ [Online; retrieved May 5th 2014].
38
www.kth.se
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement