Temporal web dynamics and its application to information retrieval.

Temporal web dynamics and its application to information retrieval.
TEMPORAL WEB
DYNAMICS AND ITS
APPLICATION TO
INFORMATION RETRIEVAL
Yi Chang, Fernando Diaz, Anlei Dong,
Susan Dumais, Kira Radinsky, Milad
Shokouhi
Feb 4, 2013
WSDM 2013 Tutorial
Web content dynamics
WSDM 2013 Tutorial
Schedule


Introduction (9:00-9:15)
Modeling Dynamics







Web content dynamics [Susan]
Web user behavior dynamics [Milad]
Break
Spatio-temporal analysis [Fernando]
Methods for evaluation
Lunch (13:00-14:30)
Applications to Information Retrieval






9:15-10:15
10:15-11:15
11:15-11:30
11:30-13:00
14:30-15:45 Temporal NLP [Kira]
News event prediction
15:45-16:00 Break
16:00-17:45 Time-sensitive search [Yi]
Time-sensitive recommendations [Anlei]
Wrap-Up (17:45-18:00)
Schedule


Introduction (9:00-9:15)
Modeling Dynamics







Web content dynamics [Susan]
Web user behavior dynamics [Milad]
Break
Spatio-temporal analysis [Fernando]
Methods for evaluation
Lunch (13:00-14:30)
Applications to Information Retrieval






9:15-10:15
10:15-11:15
11:15-11:30
11:30-13:00
14:30-15:45 Temporal NLP [Kira]
News event prediction
15:45-16:00 Break
16:00-17:45 Time-sensitive search [Yi]
Time-sensitive recommendations [Anlei]
Wrap-Up (17:45-18:00)
Time and Time Again …

Time is pervasive in information systems
New documents appear all the time
 Document content changes over time
 Queries and query volume change over time
 What’s relevant to a query changes over time

E.g., U.S. Open 2013 (in June vs. Sept)
 E.g., U.S. Open 2013 (before, during, after event)


User interaction changes over time


Relations between entities change over time


E.g., anchor text, “likes”, query-click streams, social
networks, etc.
E.g., President of the U.S. is <> [in 2012 vs. 2004]
… yet, most information retrieval systems ignore
time !
Web Content Dynamics


Overview
Change in “persistent” web documents
Characterizing content dynamics
 Systems and applications


Change in “real-time” content streams
Characterizing content dynamics
 Systems and applications


Change in Web graphs
Web graph evolution
 Authority and content over time

Content Dynamics


Easy to capture
But … few tools
or algorithms
support
dynamics
Content Dynamics
Content Changes
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
User Visitation/ReVisitation
Today’s Search and Browse and Experiences
But, ignores …
Content Dynamics
term
time
doc




Traditional IR: single snap shot
Word/query trends: aggregates over
docs
Document change: aggregates over
terms
(Word,Document) trends:
Content Dynamics

Types of content
 Persistent
documents (E.g., Web pages that persist
over time)
 Real-time
streams (E.g., Twitter, Facebook, blogs)
 Somewhere in between (E.g., the Web, Wikipedia)

How content change is discovered
 Crawling
 Feeds
 Wikis
Web Content Dynamics


Overview
Change in “persistent” web documents

Characterizing content dynamics




Systems and applications
Change in “real-time” content streams



Page-level changes
Within-page changes
Characterizing content dynamics
Systems and applications
Change in Web graphs


Web graph evolution
Authority and content over time
Web Crawling: Cho & GarciaMolina

Crawled 720k pages (from 270 popular sites), once per
day, 4 months

How often does a web page change?



What is the lifespan of a page?


~10% < 1 week; 50% > 4 months
Model when a page will change

Poisson process - a sequence of random events, occur independently, at a
fixed rate over time ( )
PDF =

Also, Radinsky & Bennett (WSDM 2013)


23% change every day; 30% never change
Differs by domain
Use to improved crawling policy
J. Cho and H. Garcia-Molina. The evolution of the web and implications for an incremental crawler. VLDB 200
Web Crawling: Fetterly et al.

Crawled 150m pages (seed Yahoo! home page), once
per week, 11 weeks

How often does a web page change?


When was last successful crawl?



67% never changed
Avg, 88% on last crawl
Varies by domain (.cn 79%, .dk/.gov 95%)
How much does a web page change?

Avg, (~4% >med, 20% small, 10% no text, 67% no change)
D. Fetterly, M. Manasse, M. Najork and J. Weiner. A large-scale study of the evolution of web pages. WWW 200
Web Crawling: Adar et al.

Crawled 50k pages (usage-sensitive sample),
once per hour (at least), 5 weeks

Usage-sensitive sample



Number of unique users
Re-visits per user
Inter-visit interval
Summary page-level metrics
 Detailed within-page changes, term longevity
 Applications to Ranking and UX (Diff-IE)

E. Adar, J. Teevan, S. T. Dumais and J. Elsas. The web changes everything: Understanding the dynamics of web
content. WSDM 2009.
Adar et al.: Page-level Change
Summary metrics




1
67% of visited pages changed
63% of these changed every
hour
Popular pages change more
frequently, but not by much
.com pages change at
intermediate frequency, but by
more
Change curves

Fixed starting point

Measure similarity over different
time intervals
0.8
Dice Similarity

0.6
0.4
Knot point
0.2
0
Time from starting point
Adar et al: Within-Page Change

Term-level changes

Divergence from norm







cookbooks
salads
cheese
ingredient
bbq
…
“Staying power” in page
Sep.
Oct.
Nov.
Time
Dec.
Example Term Longevity
Graphs
Change and Term Importance



Traditional IR uses “tf/idf” term weighting
Time-aware term weighting

Elsas & Dumais, WSDM 2010 – language model
partitioned by term longevity (+ change prior on
doc)

Aji et al., CIKM 2010 – importance of a term
determined by its revision history (RHA)

Efron, JASIST 2010 – importance of a term
determined by its deviation from linear time
series
Used to improve ranking
Systems and Applications

Systems




Internet Archive (e.g., WayBack Machine)
Internet Memory Foundation
Wikipedia
Index structures to support time-travel search


Berberich et al. SIGIR 2007, Anand et al. SIGIR 2012.
Applications

Crawling
Ranking
Query suggestion, burst detection, …

User experience


Dynamics and User Experience

Content changes
 Diff-IE
(Teevan et al., 2008)
 Zoetrope (Adar et al., 2008)
 Diffamation (Chevalier et al., 2010)
 Temporal summaries and snippets …

Interaction changes
annotations, ratings, “likes”, etc.
 Implicit interest via interaction patterns
 Edit wear and read wear (Hill et al., 1992)
 Explicit
Diff-IE
Diff-IE
toolbar
Changes to page since your last
visit
J. Teevan, S. T. Dumais, D.Liebling and R. Hughes. Changing how people view change on the web.
Interesting Features of Diff-IE
New to you
Always on
Non-intrusive
In-situ
Download: http://research.microsoft.com/en-
Diff-IE in Action

Expected changes
Unexpected changes

Diff-IE changes how people view change

 People
revist more
 Revists to pages that change more
Zoetrope


System that enables interaction with historical
Web
Select regions of interest (x-y location, dom
structure, text)
 E.g.,
stock price, traffic status, headlines about wsdm,
…
 Operators
for manipulating streams of interest
 Filter
 Link
 Visualize
E. Adar, M. Dontcheva, J. Fogarty and D. Weld. Zoetrope: Interacting with the ephemeral web.
Web Content Dynamics


Overview
Change in “persistent” web documents
Characterizing content dynamics
 Systems and applications


Change in “real-time” content streams
Characterizing content dynamics
 Systems and applications


Change in Web graphs
Web graph evolution
 Authority and content over time

Change in “Real-Time” Content
Streams

Real-time streams of new content
 Twitter,
Facebook, YouTube, Pinterest, etc.
 News, Blogs, etc.

And also …
 Wikipedia
 Commerce
sites (e.g., EBay, Amazon, etc.)
Change in Twitter


Apr 2010, Twitter and US Library of Congress enter into
agreement
Jan 2013, Status report from Library of Congress Archive





171 billion tweets (2006-2012)
Tweets/year
 21b (2006-2010); 150b (2011-2012)
Tweets/day <from Twitter>
 200m (6/2011); 400m (6/2012); 500m (10/2012)
Max Tweets/second <from Twitter>
 7k (Jan 1, 2011); 25k (Dec 11, 2012); 33k (Jan 1, 2013)
The Library has not yet provided researchers access to the archive.
Currently, executing a single search of just the fixed 2006-2010
archive on the Library’s systems could take 24 hours. This is an
inadequate situation in which to begin offering access to
researchers, as it so severely limits the number of possible
searches.
Temporal Analysis of Twitter

How different are tweets (and queries) day-overday?

Term (and top-term) distributions
KL Divergence (t+1|t)
 Churn: Fraction of terms in top r terms at t, that are not in
top r at t+1
 Out-of-vocabulary: Fraction of terms in top r terms at t+1,
that are not in top r at t



Zooming in on Oct 5th death of Steve Jobs (~midnight
UTC) (at 5 minute intervals)
Significant churn
Impacts methods for estimating term-level statistics
 During major events, sub-hour updates important

J. Lin and G. Mishne. A study of “churn” in tweets and real-time search queries. ICWSM 2012.
Temporal Analysis of “Memes”




Tracking short distinctive phrases (“memes”) in news
media and blogs
90 million articles/blogs over 3 months (Aug – Oct 2008)
Cluster variants of phrases into memes
Global patterns



Probabilistic model that combines imitation and recency
Choose(𝑗) ∝ 𝑓 𝑛𝑗 𝑔(𝑡 − 𝑡𝑗 )
Local patterns


Peak of attention in blogs lags peak in news media by 2.5
hours
Divergent behavior around overall peak, handoff between
news and blogs
J. Leskovec, L. Backstrom and J. Kleinberg. Meme-tracking and the dynamics of the news
Temporal Analysis of Blogs &
Twitter

Patterns of temporal variation

Short texts over time
 Short
text phrases (memes) <from 170m news
articles>
 Hashtags <from 580m Twitter posts>

Spectral clustering
6
clusters News/Blogs
 6 clusters Twitter

Predict type given early mentions
J. Yang and J. Leskovec. Patterns of temporal variation in online media. WSDM 2011.
Web Content Dynamics


Overview
Change in “persistent” web documents
Characterizing content dynamics
 Systems and applications


Change in “real-time” content streams
Characterizing content dynamics
 Systems and applications


Change in Web graphs
Web graph evolution
 Authority and content over time

Static Graphs/Networks


Example graphs: web, tweets, emails,
citation networks, etc.
Properties
#nodes, #edges, reciprocity, clustering
coefficient, heavy tails for in- and out-degree
distributions, size of largest connected
component, …
 Small-world phenomenon


Models for graph generation
Preferential attachment
 Copying

Evolution of Graphs over Time
ArXiv citation graph, Patent citation graph,
Autonomous systems graph, Affiliation
graph
 Empirical observations

 Densification
 Densification:
Average out-degree increases over
e(t )  n(t ) a
time
 Densification power law: Nodes vs. edges over time
fit by power law
 Shrinking
effective diameter
Generative model
J. Leskovec,
 J. Kleinberg and C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible
Web Page Authority over Time

Query: wsdm

Why is older
content ranked
higher?
 Behavioral
signals
(in-links, clicks)
more prevalent for
older pages
Web Page Authority over Time

Modeling page authority over time




Multiple web snapshots (.ie domain from IA, 2000-2007)
Temporal page profiles (TPP) and temporal in-link profiles (TLP)
Page freshness score, using exponential decay over time
Use freshness score to control authority propagation in a
temporal random surfer model
Web surfer has temporal intent
(which controls choice of target snapshot)
 Web surfer prefers fresh content


Rank using combination of content and temporal authority
𝑆𝑐𝑜𝑟𝑒 𝑝 = 𝛾𝐵𝑀25 + 1 − 𝛾 𝑇𝑒𝑚𝑝𝑜𝑟𝑎𝑙𝐴𝑢𝑡ℎ𝑜𝑟𝑖𝑡𝑦
N. Dai and B. Davison. Freshness matters in flowers, food and web authority.
CoEvolution of Structure and
Content

Three networks over time


Characteristics of network structure


Similarity, Divergence of language models
Empirical correspondence of network structure and
content diversity and novelty



Standard metrics, Conductance, Expectedness
Measures of network content


Twitter, Second Life, Enron email
Conductance correl w/ high diversity of content
Expectedness correl w/ content novelty
Simulation model

Node policy to forward based on recency, novelty and
topicality
C-T Teng et al. Coevolution of network structure and
Web Content Dynamics


Overview
Change in “persistent” web documents
Characterizing content dynamics
 Systems and applications


Change in “real-time” content streams
Characterizing content dynamics
 Systems and applications


Change in Web graphs
Web graph evolution
 Authority and content over time

Resources

Web crawls
CLUEWeb’09, CLUEWeb’12 (static snapshots)
 Common Crawl
 PageTurner
 Internet Archive
 Publication/citation


Content streams
Twitter API, Library of Congress
 Wikipedia (+ aggregate usage data)
 Blogs (TREC), Blogs (Spinn3r)
 Yahoo! update firehose (shutting down Apr 13, 2013)

References












J. Cho and H. Garcia-Molina. The evolution of the web and implications for an incremental
crawler. VLDB 2000.
A. Ntoulas, J. Cho and C. Olston. What’s new on the web? The evolution of the web from a
search engine perspective. WWW 2004.
D. Fetterly, M. Manasse, M. Najork and J. Weiner. A large-scale study of the evolution of web
pages. WWW 2003.
E. Adar, J. Teevan, S. T. Dumais and J. Elsas. The web changes everything: Understanding the
dynamics of web content. WSDM 2009.
J. Teevan, S. T. Dumais, D.Liebling and R. Hughes. Changing how people view change on the
web. UIST 2009.
E. Adar, M. Dontcheva, J. Fogarty and D. Weld. Zoetrope: Interacting with the ephemeral web.
UIST 2008.
J. Lin and G. Mishne. A study of “churn” in tweets and real-time search queries. ICWSM 2012.
J. Leskovec, L. Backstrom and J. Kleinberg. Meme-tracking and the dynamics of the news cycle.
KDD 2009
J. Yang and J. Leskovec. Patterns of temporal variation in online media. WSDM 2011.
J. Leskovec, J. Kleinberg and C. Faloutsos. Graphs over time: Densification laws, shrinking
diameters and possible explanations. KDD 2005.
N. Dai and B. Davison. Freshness matters in flowers, food and web authority. SIGIR 2010.
C-T Teng et al. Coevolution of network structure and content. ArXiv.
Temporal Dynamics of Queries & User Behavior
WSDM 2013 Tutorial
Outline

Query Dynamics
 Hourly,

Categorizing Time-Sensitive Queries
 Spike,

Periodicity
Modeling Query Dynamics
 Burst

Daily & Monthly Trends
detection, Time-Series
Temporal Patterns in User Behavior
 Re-finding,

Long-term vs. Short-term
Temporal Patterns & Search Evaluation
 Predicting
Search Satisfaction (SAT)
Query Dynamics
Temporal Analysis of Query
Logs

Hourly analysis of queries [Beitzel et. al,
SIGIR2004, JASIST 2007]
Temporal Analysis of Query
Logs
Temporal Analysis of Query
Logs
Temporal Analysis of Query
Logs
Temporal Analysis of Query
Logs
Temporal Analysis of Query
Logs
Temporal Analysis of Query
Logs
Temporal Analysis of Query
Logs
Temporal Analysis of Query
Logs
Temporal Analysis of Query
Logs
Temporal Analysis of Query
Logs
Outline

Query Dynamics
 Hourly,

Categorizing Time-Sensitive Queries
 Spike,

Periodicity
Modeling Query Dynamics
 Burst

Daily & Monthly Trends
detection, Time-Series
Temporal Patterns in User Behavior
 Re-finding,

Long-term vs. Short-term
Temporal Patterns & Search Evaluation
 Predicting
Search Satisfaction (SAT)
Categorizing Query Dynamics
Burst, Periodicity
Categorizing Temporal Queries

Temporal query
classes
[Kulkarni et al.,
WSDM2011]
Categorizing Temporal Queries
Categorizing Temporal Queries

Bald Britney
Categorizing Temporal Queries

Ipad Mini
Categorizing Temporal Queries

US Election
Categorizing Temporal Queries

Iran Election
Categorizing Temporal Queries

Query dynamics
versus content
changes
Categorizing Temporal Queries

Click entropy vs.
change in intent.
Outline

Query Dynamics
 Hourly,

Categorizing Time-Sensitive Queries
 Spike,

Periodicity
Modeling Query Dynamics
 Burst

Daily & Monthly Trends
detection, Time-Series
Temporal Dynamics of User Behavior
 Re-finding,

Long-term vs. Short-term
Temporal Patterns & Search Evaluation
 Predicting
Search Satisfaction (SAT)
Modeling Query Dynamics
Burst Detection, Time-Series
Burst Detection

[Vlachos et al., SIGMOD2004]
Burst Detection
. . there seems something else in life besides time,
something which may conveniently be called “value,"
something which is measured not by minutes or hours
but by intensity, so that when we look at our past it
does not stretch back evenly but piles up into a few
notable pinnacles, and when we look at the future it
seems sometimes a wall, sometimes a cloud,
sometimes a sun, but never a chronological chart –
E.M. Foster
Burst Detection

Bursty and hierarchical structure in streams
[Kleinberg, KDD2002]
 Simple
randomized model
 Gap
x between messages i and i+1 is distributed according to
the “memoryless” exponential density function f(x) = a*exp(ax)
 Expected gap = 1/a (rate)
A
two-state model
 State
q0 (low) with a0 and state q1 (high) with a1
 The state changes with Pr=p and remains at current state
Pr=(1-p)
 Each state sequence q induces a density function fq over
sequences of gap.
Burst Clustering
Burst clustering [Parikh and Sundaresan,
KDD2008]
 Matterhorn: new products
 Cuestas: limited release
followed by wide-spread.

Time-Series

A time-series is a set of discrete or continuous
observations over time.
 Applications
 Data
modeling
 Forecast
 Examples
 Sales
figures
 Student enrolment
 CO2 rate
 Query popularity
Time-series (Single Exponential
Smoothing)




The data points are modeled with a weighted average.
𝑦, 𝑦, 𝑦: Respectively represent actual, smoothed and predicted values at
time t.
λ: Smoothing constant
Forecast:
Time-Series (Double Exponential
Smoothing)




𝑦, 𝑦, 𝑦: Respectively represent actual, smoothed and predicted values at
time t
𝜆1 , 𝜆2 : Smoothing constants
𝐹𝑡 : Trend factor at time t
Forecast:
Time-Series (Trends + Seasonality)
Time-series (Triple Exponential
Smoothing)






𝑦, 𝑦, 𝑦: Respectively represent actual, smoothed and predicted
values at time t
𝜆1 , 𝜆2 , 𝜆3 : Smoothing constants
𝐹𝑡 : Trend factor at time t
𝑆𝑡 : Seasonality factor at time t
τ: Length of seasonal cycle
Forecast:
Query Frequency
Time-Series
Time-Series Modeling of Queries
for Detecting Influenza Epidemics


Ginsberg et al. [Nature 2009]
Time-series models for 50 millions of the most
popular queries
Time-Series Modeling of Queries
for Detecting Influenza Epidemics


Publicly available
historical data from
the CDC’s U.S. was
used to train the
models.
The data was
matched against the
50 million queries for
finding the ones with
the highest
correlation.
Classifying Seasonal Queries by Timeseries

Classifying seasonal queries [Shokouhi,
SIGIR2011]
Periodicity Detection

Discrete Fourier Transform

Periodogram

Auto-Correlation
Periodicity Detection

Periodicity
the accuracy deteriorates
for large periods
 Spectral leakage


Auto-Correlation
Automatic discovery of
important peaks is more
difficult
 Multiplies of the same
basic period also appear
as peaks.
 Low amplitude events of
high frequency may look
less important.

Periodicity Detection

Priodogram + Auto-Correlation[Vlachos et al.,
SDM2005]
Periodicity Detection
Learning to Predict Query
Frequency

Learning to predict frequency trends from timeseries features [Radinsky et al., WWW2012]
Outline

Query Dynamics
 Hourly,

Categorizing Time-Sensitive Queries
 Spike,

Periodicity
Modeling Query Dynamics
 Burst

Daily & Monthly Trends
detection, Time-Series
Temporal Patterns in User Behavior
 Re-finding,

Long-term vs. Short-term
Temporal Patterns & Search Evaluation
 Predicting
Search Satisfaction (SAT)
Temporal Dynamics of User
Behavior
Long/Short History, Re-finding & Re-ranking
Long-term history

Richardson [TWEB2008]
 Query
effects are long-lasting.
 Users
can be distinguished from their past queries
 Long-lasting
effects are useful for studying
 Topic
hierarchies
 Temporal evolution of queries.
 Learning
from common similar trends in histories is
useful
 E.g.
relationship between medical condition and potential
causes.
Long-term history

Example
The medical use of caffeine for migraine is common.
 Migraine is highly correlated with caffeine in users
search histories.



Baseball: Beer
Ski: Wine
Long-term history

Comparing users by their long history
Long-term history

Temporal evolution of information needs
Long-term history

Generating topic hierarchies.
 Long-term
history could be more effective than
short-term history for generating topics
Long-term history

Temporal querying behaviour.
 Do
men buy the ring first or figure out how to
propose?
Long-term history

Temporal querying behaviour.
Long-term history

Temporal querying behaviour.
Long-term history

Temporal querying behaviour.
Re-finding

Traces on query logs of 114 anonymous users
[Teevan et al. SIGIR’07]
 Up

to 40% re-finding
Large-scale log analysis [Tyler & Teevan
WSDM2010]
 30%
of single-click Queries
 5% of multi-click queries
 66% of re-finding queries are previous queries for
later re-findings
 48% of re-findings happens within a single session
Re-finding

Predicting personal navigation [Teevan et al.
WSDM11]
Re-finding & Re-Ranking

Predicting personal navigation [Teevan et al.
WSDM11]
Re-finding & Re-Ranking

Personal level re-finding [Dou et al.,
WWW2007]
 #previous
clicks on query-url pairs
 #previous click on urls from the same topic
 Re-ranking most effective on comment web
search queries with high-entropy click distribution.
 Using both short-term and long-term contexts is
better than using one of them alone.
Long-term vs. Short-term

Long vs. Short for search personalization
[Bennett et al. SIGIR2012]
Long-term vs. Short-term

Long-term gains are generally higher
Long-term vs. Short-term

Long-term features are more effective for
personalization early in the session
Cross-Device Search

People frequently search cross-device (15%
about continuous task) [Wang et al.
WSDM2013]
Cross-Device Search
Outline

Query Dynamics
 Hourly,

Categorizing Time-Sensitive Queries
 Spike,

Periodicity
Modeling Query Dynamics
 Burst

Daily & Monthly Trends
detection, Time-Series
Temporal Dynamics of User Behavior
 Re-finding,

Long-term vs. Short-term
Temporal Dynamics of User Behavior for
Evaluation
 Predicting
Search Satisfaction (SAT)
Temporal Dynamics of User
Behavior for Search Evaluation
Predicting Search Satisfaction & Click
Modeling
Search Difficulty vs. Task Time


179 participants [Aula et al. CHI2010]
Difficult tasks take longer
Search Difficulty vs. Task Time

More time spent for difficult tasks
Search Difficulty vs. Task Time

More time spent on SERP for difficult tasks
Implicit Measures for evaluation


Fox et al. [TOIS2005] compared several
implicit signals.
Such signals (e.g. SAT-Clicks) are particularly
useful for training personalized rankers
Implicit Measures for evaluation

SAT-Prediction accuracy based on result-level
features.
Implicit Measures for evaluation

Dwell time is positively correlated with SAT.
Implicit Measures for evaluation

Time to first click for SAT prediction [Hassan et
al., CIKM2011]
Implicit Measures for evaluation

Time to first click for DSAT prediction [Hassan
et al., CIKM2011]
References











Alexander Kotov, Paul N. Bennett, Ryen W. White, Susan T. Dumais, Jaime Teevan: Modeling
and analysis of cross-session search tasks. SIGIR 2011: 5-14
Sarah K. Tyler, Jaime Teevan: Large scale query log analysis of re-finding. WSDM 2010: 191-200
Jaime Teevan: How people recall, recognize, and reuse search results. ACM Trans. Inf. Syst.
26(4) (2008)
Jaime Teevan, Eytan Adar, Rosie Jones, Michael A. S. Potts: Information re-retrieval: repeat
queries in Yahoo's logs. SIGIR 2007: 151-158
Jaime Teevan, Eytan Adar, Rosie Jones, Michael A. S. Potts: History repeats itself: repeat queries
in Yahoo's logs. SIGIR 2006: 703-704
Jaime Teevan: How people recall search result lists. CHI Extended Abstracts 2006: 1415-1420
Zhicheng Dou, Ruihua Song, Ji-Rong Wen: A large-scale evaluation and analysis of personalized
search strategies. WWW 2007: 581-590
Jaime Teevan, Daniel J. Liebling, Gayathri Ravichandran Geetha: Understanding and predicting
personal navigation. WSDM 2011: 85-94
Matthew Richardson: Learning about the world through long-term query logs. TWEB 2(4) (2008)
Yu Wang, Xiao Huang, Ryen White, Characterizing and Supporting Cross-Device Search Tasks,
WSDM 2013
Amanda Spink, Minsoo Park, Bernard J. Jansen, Jan O. Pedersen: Multitasking during Web
search sessions. Inf. Process. Manage. 42(1): 264-275 (2006)
References

Anne Aula, Rehan M. Khan, Zhiwei Guan: How does search behavior change as search becomes more difficult? CHI 2010:
35-44

Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan T. Dumais, Thomas White: Evaluating implicit measures to improve web
search. ACM Trans. Inf. Syst. 23(2): 147-168 (2005)

Ahmed Hassan, Yang Song, Li-wei He: A task level metric for measuring web search satisfaction and its application on
improving relevance estimation. CIKM 2011: 125-134

Zhen Liao, Yang Song, Li-wei He, Yalou Huang: Evaluating the effectiveness of search task trails. WWW 2012: 489-498

Ryen W. White, Susan T. Dumais: Characterizing and predicting search engine switching behavior. CIKM 2009: 87-96

Ryen W. White, Jeff Huang: Assessing the scenic route: measuring the value of search trails in web logs. SIGIR 2010: 587594

Thorsten Joachims, Laura A. Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, Geri Gay: Evaluating the accuracy of
implicit feedback from clicks and query reformulations in Web search. ACM Trans. Inf. Syst. 25(2) (2007)

Fan Guo, Chao Liu, Anitha Kannan, Tom Minka, Michael J. Taylor, Yi Min Wang, Christos Faloutsos: Click chain model in web
search. WWW 2009: 11-20

Yuchen Zhang, Weizhu Chen, Dong Wang, Qiang Yang: User-click modeling for understanding and predicting searchbehavior. KDD 2011: 1388-1396

Georges Dupret, Benjamin Piwowarski: A user behavior model for average precision and its generalization to graded
judgments. SIGIR 2010: 531-538

Zeyuan Allen Zhu, Weizhu Chen, Tom Minka, Chenguang Zhu, Zheng Chen: A novel click model and its applications to online
advertising. WSDM 2010: 321-330

Olivier Chapelle, Ya Zhang: A dynamic bayesian network click model for web search ranking. WWW 2009: 1-10
References

Anagha Kulkarni, Jaime Teevan, Krysta Marie Svore, Susan T. Dumais: Understanding temporal query dynamics. WSDM 2011: 167-176

Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, Ophir Frieder, David A. Grossman: Temporal analysis of a very large topically categorized Web query
log. JASIST 58(2): 166-178 (2007)

Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury, David A. Grossman, Ophir Frieder: Hourly analysis of a very large topically categorized web query log.
SIGIR 2004: 321-328

Kira Radinsky, Krysta Marie Svore, Susan T. Dumais, Jaime Teevan, Alex Bocharov, Eric Horvitz: Modeling and predicting behavioral dynamics on the web.
WWW 2012: 599-608

Fabrizio Silvestri, Mining Query Logs: Turning Search Usage Data into Knowledge, Foundations and Trends in Information Retrieval, v.4 n.1—2, p.1-174,
January 2010

Michail Vlachos, Philip S. Yu, Vittorio Castelli, Christopher Meek: Structural Periodic Measures for Time-Series Data. Data Min. Knowl. Discov. 12(1): 1-28
(2006)

Michail Vlachos, Christopher Meek, Zografoula Vagena, Dimitrios Gunopulos: Identifying Similarities, Periodicities and Bursts for Online Search Queries.
SIGMOD Conference 2004: 131-142

Fernando Diaz: Integration of news content into web results. WSDM 2009: 182-191

Arnd Christian König, Michael Gamon, Qiang Wu: Click-through prediction for news queries. SIGIR 2009: 347-354

Yoshiyuki Inagaki, Narayanan Sadagopan, Georges Dupret, Anlei Dong, Ciya Liao, Yi Chang, Zhaohui Zheng: Session Based Click Features for Recency
Ranking. AAAI 2010

Milad Shokouhi: Detecting seasonal queries by time-series analysis. SIGIR 2011: 1171-1172

Jon M. Kleinberg: Bursty and hierarchical structure in streams. KDD 2002: 91-101

Nish Parikh, Neel Sundaresan: Scalable and near real-time burst detection from eCommerce queries. KDD 2008: 972-980

Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Philip S. Yu, Hongjun Lu: Parameter Free Bursty Events Detection in Text Streams. VLDB 2005: 181-192

Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L, Detecting influenza epidemics using search engine query data, Nature 457, 10121014 (19 February 2009)

Steve Chien, Nicole Immorlica: Semantic similarity between search engine queries using temporal correlation. WWW 2005: 2-11

Silviu Cucerzan and Eric Brill, Extracting Semantically Related Queries by Exploiting User Session Information Unpublished Draft (submitted to WWW-2006,
November 2005)

Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, Shaul Markovitch: A word at a time: computing word relatedness using temporal semantic analysis.
WWW 2011: 337-346

Qiankun Zhao, Steven C. H. Hoi, Tie-Yan Liu, Sourav S. Bhowmick, Michael R. Lyu, Wei-Ying Ma: Time-dependent semantic similarity measure of queries
using historical click-through data. WWW 2006: 543-552
Spatio-temporal and Socio-temporal Trends
WSDM 2013 Tutorial
Schedule


Introduction (9:00-9:15)
Modeling Dynamics








Web content dynamics [Susan]
Web user behavior dynamics [Milad]
Break
Web user behavior dynamics, cont’d
Spatio-temporal analysis [Fernando]
Methods for evaluation
Lunch (13:00-14:30)
Applications to Information Retrieval






9:15-10:15
10:15-10:45
10:45-11:00
11:00-11:30
11:30-13:00
14:30-15:45 Temporal NLP [Kira]
News event prediction
15:45-16:00 Break
16:00-17:45 Time-sensitive search [Yi]
Time-sensitive recommendations [Anlei]
Wrap-Up (17:45-18:00)
Multidimensional Dynamics
Information Exist in Context

temporal
does the document refer to a specific time?
 does the information need refer to a specific time?


geographic
does the document refer to a specific location?
 does the information need refer to a specific location?


social
does the document refer to a specific group of
people?
 does the information need refer to a specific group of
people?


many, many others
Multidimensional Modeling

Spatiotemporal
 appropriate
when we suspect both temporal and
geographic salience.

Sociotemporal
 appropriate
when we suspect both temporal and
social salience.
Spatiotemporal Modeling


Goal: study the ability to capture spatial and
temporal aspects for topics.
Approach: study the ability to capture spatial
and temporal aspects for spatiotemporally
acute events.
 simplifies
the task to topics likely to exhibit
capturable behavior.
 many spatiotemporally acute events receive a lot
of query and document volume (e.g. natural
disasters).
Spatiotemporal Modeling Case
Studies



News: exploit text-based production to model
topics over space and time.
Queries: exploit text-based demand to model
topics over space and time.
Images: exploit image metadata production to
model topics over space and time.
Why experiment with news?

News articles often focus on temporally acute
events.
 natural
disaster updates
 political coverage

News corpora are easy to deal with
 availability
(e.g. online, LDC)
 standardized (e.g. LDC corpora, Reuters)
 clean, journalistic language
 reliable timestamps
Temporal Sensitivity of News
Interest
news (blue), social media (red), and query volume (green) for 2010 New York tornad
[Yom-Tov and Diaz 2011
Geographic Sensitivity of News
Interest
Spearman correlation between physical distance and the fraction of
media items and relevant queries, for each of the three events. All
correlations are statistically significant at p < 0.05.
[Yom-Tov and Diaz 2011
Modeling Spatiotemporal News

Assume that words in an article are sampled
from two underlying distributions,
a
background language model: represents word
usage common across time and geography (e.g.
determiners, pronouns).
 a spatiotemporal theme model: represents word
usage specific to a time and place (i.e. an event).
[Mei et al. 2006]
Modeling Spatiotemporal News
[Mei et al. 2006]
Modeling Spatiotemporal News
notation modified for clarity.
[Mei et al. 2006]
Modeling Spatiotemporal News

Model parameters
 background
model: maximum likelihood estimate
from corpus.
 …similar
for document model.
 theme model: estimated my expectation
maximization.
[Mei et al. 2006]
Modeling Spatiotemporal News
themes extracted from blog posts about Hurricane Katrina.
[Mei et al. 2006]
Modeling Spatiotemporal News
“storm” theme broken down by state.
[Mei et al. 2006]
News Open Questions


Task: how can this information be used for
information access tasks?
Granularity: how can we model small
scale/”tail” events underrepresented in the
national news?
Why experiment with queries?

Queries sometimes focus on temporally acute
events.
 natural

disaster queries
Temporally acute queries are important
 information
need is urgent
 high-visibility failure
Modeling Spatiotemporal
Queries
[Ginsberg et al. 2009]
Modeling Spatiotemporal
Queries
[Carneiro and Mylonakis 2009]
Modeling Spatiotemporal
Queries
[Carneiro and Mylonakis 2009]
Modeling Spatiotemporal
Queries
[Backstrom et al. 2009]
Modeling Spatiotemporal
Queries
[Backstrom et al. 2009]
Modeling Spatiotemporal
Queries
[Backstrom et al. 2009]
Modeling Spatiotemporal
Queries
[Backstrom et al. 2009]
News Open Questions



Task: how can this information be used for
information access tasks?
Granularity: how can we model small
scale/”tail” events underrepresented in query
logs?
More dimensions: what other dimensions can
be incorporated from query logs?
Why experiment with images?


Photographs are taken at a specific time and
place, often with keyword tags.
Photograph corpora are easy to deal with
 photographs
exist in volume (people like to take
pictures)
 photographs have precise spatiotemporal data
 photographs are manually tagged (“the food is
bad but the portions are large”)
Modeling Spatiotemporal
Images
Problem definition:
can time and place semantics for a tag be
derived from the tag’s location and time usage
distribution?
[Rattenbury et al. 2007]
Modeling Spatiotemporal
Images




Short tags can often be
attributed to the photo place
or event.
place tag: expected to
exhibit significant spatial
patterns.
event tag: expected to
exhibit significant temporal
patterns.
“significant pattern” refers to
a burst of activity in space or
time.
#wsdm2013 #rome
[Rattenbury et al. 2007]
Subtasks
1.
scale specification: at what granularity
should we look for patterns?


2.
time: seconds? minutes? days?
space: neighborhood? city? state?
segment specification: how do we partition
the dimension for analysis?


time: uniform segments? volume-weighted?
consider diurnal patterns?
space: uniform grid? political boundaries (e.g.
urban, state)?
[Rattenbury et al. 2007]
Subtasks
3.
significance testing: is the behavior in this
segment different from behavior outside of
the segment?


4.
time: compare to before and after? previous
day? week? month? year?
space: compare to all surrounding? similar
city?
determine event scale: how do we
aggregate granular results to larger scales?


unsmoothed estimate?
repeat process for multiple scales?
[Rattenbury et al. 2007]
Experiments



public photograph datasets (e.g. Flickr) often
include rich space and time metadata.
manually judge the events and locations
referred to by tags.
predict whether a tag refers to an event or
location, compute precision and recall of
labels in ranked list of tags.
[Rattenbury et al. 2007]
Modeling Spatiotemporal
Images
[Rattenbury et al. 2007]
Modeling Spatiotemporal Images
[Rattenbury et al. 2007]
Detecting Periodic Events in
Spatiotemporal Images
images with tags,
• f1
• formulaone
• unitedstatesgrandprix
[Chen and Roy 2009]
Detecting Periodic Events in
Spatiotemporal Images
[Chen and Roy 2009]
Image Open Questions



Task: how can this information be used for
information access tasks?
Granularity: how can we model small
scale/”tail” events underrepresented in
images?
More dimensions: what other dimensions can
be incorporated from images?
Sociotemporal Modeling


Goal: study the ability to capture social and
temporal aspects for topics.
Approach: study the ability to capture spatial
and temporal aspects for sociotemporally
acute events.
 often
includes spatiotemporally acute events
(news—especially if unexpected—attracts
attention)
 also includes completely virtual events (e.g.
`memes’)
Sociotemporal Modeling Case
Studies


Video Sharing: users often watch and
promote videos over social networks (e.g.
email, instant messaging, microblogs).
Information Seeking During Disaster: users
often query for information about a disaster if
social contacts are affected.
Types of Sociotemporal Topics




Exogenous Critical: topic is propagated
throughout the social network by an external
stimulus (e.g. earthquake).
Endogenous Critical: topic is propagated
throughout the social network without external
stimulus (e.g. lolcats).
Exogenous Subcritical: topic does not
spread despite external stimulus (e.g. car
accident).
Endogenous Subcritical: topic does not
spread and is not externally stimulated.
[Crane and Sornette 2008]
Sociotemporal Dynamics of Video
Sharing


Corpus: time stamped view information from a
video-sharing site.
Research Question: does the viewing
information suggest an underlying epidemic
model?
[Crane and Sornette 2008]
Types of Sociotemporal
Behavior
[Crane and Sornette 2008]
Types of Sociotemporal
Behavior
[Crane and Sornette 2008]
Types of Sociotemporal
Behavior
[Crane and Sornette 2008]
Sociotemporal Dynamics of Video
Sharing


Evidence supports hypothesis of an epidemic
process.
No explicit signals of epidemic processes.
[Crane and Sornette 2008]
Information Seeking During
Crisis


Hypothesis: users with friends in areas
affected by a crisis event are more likely to
seek information about that event than those
with no friends in those areas.
Test: Does personalizing ranking by local
connections improve retrieval?
[Yom-Tov and Diaz 2010]
Crisis Interest and Social
Connections
[Yom-Tov and Diaz 2010]
Social Contacts and Relevance
During Crisis
[Yom-Tov and Diaz 2010]
Multidimensional Modeling Open
Questions

Formal Models
 no
general model capturing spatial, social, and
temporal data.

Tasks
 need
to develop/understand tasks for which
multidimensional modeling is important.

Corpora
 need
to develop standard corpora for
sociotemporal modeling.
Methods for Evaluation
Time-Sensitive Tasks




Web Search
Topic Detection and Tracking (TDT)
TREC 2011-2013 Microblog Track
TREC 2013 Temporal Summarization Track
Web Search


Task: Given a query, provide a ranked list of
documents satisfying the user’s information
need.
Approach: Collect relevance judgments and
evaluate with a judgment-based metric
Normalized Discounted Cumulative
Gain (NDCG)
[Jarvelin and Kekalainen 2002]
Web Search




Task: Given a query, provide a ranked list of
documents satisfying the user’s information
need.
Approach: Collect relevance judgments and
evaluate with a judgment-based metric
Problem: For time-sensitive information
needs, satisfaction may include more than
topical relevance.
Solution 1: Introduce independent, timesensitive judgments.
Time-Sensitive Gains
Web Search




Task: Given a query, provide a ranked list of
documents satisfying the user’s information
need.
Approach: Collect relevance judgments and
evaluate with a judgment-based metric
Problem: For time-sensitive information
needs, satisfaction may include more than
topical relevance.
Solution 2: Rely on implicit behavior (e.g.
user clicks) to capture combined target.
[Wang et al. 2012]
Open Questions


Query sampling: how to select queries likely
to have temporal intent?
Judge quality: how to select topics which are
still in the judges “memory”?
Topic Detection and Tracking



Topic Tracking: Keep track of stories similar
to a set of example stories.
Topic Detection: Build clusters of stories that
discuss the same topic.
First Story Detection: Detect if a story is the
first story of a new, unknown topic.
[Allan 2002]
Detection-Error Tradeoff
Evaluation
[Allan 2002]
Detection Cost
[Allan 2002]
Detection-Error Curve
[Allan 2002]
TREC Microblog


Retrospective search of a microblog corpus
(Twitter).
Topic definition
 title:
short keyword-style query
 description: longer explanation of intent
 time: time at which the query should be issued

Evaluation
 topical
relevance labels
 use classic ad hoc metrics with predicted-relevant
documents in reverse chronological order
[Soboroff et al. 2012]
TREC Microblog


Online filtering of a microblog corpus (Twitter).
Topic definition
 title:
short keyword-style query
 description: longer explanation of intent
 time range: times during which the filtering should
occur

Evaluation
 topical
relevance labels
 use classic filtering metrics with predictedrelevant documents
[TREC Microblog 2012 Guidelines]
TREC 2013 Temporal
Summarization Track

Sequential Update Summarization:
broadcast useful, new, and timely sentencelength updates about a developing event.

Value Tracking: can track the value of
important event-related attributes (e.g. number
of fatalities, financial impact).
Track Goals

to develop algorithms which detect sub-events
with low latency.

to develop algorithms which minimize redundant
information in unexpected news events.

to model information reliability in the presence of
a dynamic corpus.

to understand and address the sensitivity of text
summarization algorithms in an online, sequential
setting.

to understand and address the sensitivity of
information extraction algorithms in dynamic
settings.
Sequential Update
Summarization




corpus: stream of documents
input: tracking query, event onset time
output: relevant, novel, and timely text
updates
target: gold standard, time-stamped updates
Sequential Update
Summarization
Corpus

desired properties
 timestamped
documents
 topically relevant
 diverse
Input



~10 large events occurring in timespan of
corpus
<event onset time, keyword query>
<event onset time, first wikipedia revision>
Output



timestamp of the system decision, not
necessarily the the source document
id of sentence detected in the annotated
corpus
support
 id
of supporting document(s)
Gold Standard Output

nuggets semi-automatically derived from
wikipedia revision history.
Evaluation




precision: fraction of system updates that
match any Gold Standard update.
recall: fraction of Gold Standard updates that
are matches by the system.
novelty: fraction of system updates which did
not match the same Gold Standard update.
timeliness: difference between the system
update time and the matched Gold Standard
update time.
Value Tracking




corpus: stream of documents
input: tracking query, event onset time,
attribute type
output: running estimate of retrospective
attribute value
target: gold standard, retrospective attribute
value
Value Tracking
Input


~10 large events shared with Task 1
attributes
 fatalities
 financial

impact
<event onset time, keyword query, attribute
type>
Output

estimate
 extractive
 generative

support
 id
of supporting document(s)
Gold Standard Output

can be extracted from wikipedia infoboxes
Evaluation

cumulative error rate from event onset to the
end of the stream.
Research Problems

Errors in editorial data
 older

topics are harder to reliable evaluate
Simulating historic system state
 need
to “rewind the corpus” to the simulate the
state of the index at retrieval/decision-making
time
 need to “rewind external information” to prevent
“signals from the future”
Schedule


Introduction (9:00-9:15)
Modeling Dynamics








Web content dynamics [Susan]
Web user behavior dynamics [Milad]
Break
Web user behavior dynamics, cont’d
Spatio-temporal analysis [Fernando]
Methods for evaluation
Lunch (13:00-14:30)
Applications to Information Retrieval






9:15-10:15
10:15-10:45
10:45-11:00
11:00-11:30
11:30-13:00
14:30-15:45 Temporal NLP [Kira]
News event prediction
15:45-16:00 Break
16:00-17:45 Time-sensitive search [Yi]
Time-sensitive recommendations [Anlei]
Wrap-Up (17:45-18:00)
buon appetito
Temporal NLP & News Prediction
WSDM 2013 Tutorial
Outline

Temporal Language Models





Temporal Information Extraction
Future Event Prediction from News




Temporal Word Representation
Temporal Document Representation
Temporal Topics Representation
Future Event Retrieval from text
Future Event Retrieval from query stream
Future Event Retrieval from social media
Temporal Summarization


Single Timeline
Multiple Timeline
Outline

Temporal Language Models





Temporal Information Extraction
Future Event Prediction from News




Temporal Word Representation
Temporal Document Representation
Temporal Topics Representation
Future Event Retrieval from text
Future Event Retrieval from query stream
Future Event Retrieval from social media
Temporal Summarization


Single Timeline
Multiple Timeline
Words Over Time
Words Correlate Since 1800
Words Don’t Correlate Before 1970
Words Correlate After1970
Words Over Time (Temporal
Correlation)
1. Temporal representation of text
Word
Represent
a word using its
query volume
2. Temporal text-similarity
Word1
Word 2
measurement
Cross
Correlation
Or
DTW
Extend static
representation
with temporal
dynamics
Method for computing
semantic relatedness
using the temporal
representation
Steve Chien, Nicole Immorlica: Semantic similarity between search engine queries using temporal correlation. WWW 2005: 2-11
Temporal Correlation Methods (1):
Dynamic time warping (DTW)
Time-weighted distance between A
and B :
Time Series A
1
m
it
n
pk
k
 d ( p )  w(t )
D(A , B ) =
t 1
t
d(ps): distance between it and jt
w(t) > 0: weighting coefficient
jt
ps
(with decay over time)
Best alignment path between A
and B :
1 p1
Time Series B
P0 = arg min (D(A , B )).
P
Temporal Correlation Methods (2):
Cross correlation
Time-weighted distance between A
and B :
D(A , B ) =
𝑠 = 0, ±1, ±2, …
w(t) > 0: weighting coefficient
(with decay over time)
Best alignment path between A
and B :
P0 = arg Smin (D(A , B )).
Words Over Time (TSA)
1. Temporal representation of text
c1
Word
Wikipedia
Concepts
Represent
words as
concept
vectors
2. Temporal text-similarity
Word1
Word 2
measurement
Cross
Correlation
Or
DTW
cn
Extend static
representation
with temporal
dynamics
Method for computing
semantic relatedness
using the temporal
representation
Kira Radinsky, Eugene Agichtein, Evgeniy Gabrilovich, Shaul Markovitch: A word at a time: computing word relatedness using temporal
semantic analysis. WWW 2011: 337-346
Words Over Time (Time
Schemas)
1. Temporal representation of text
P
1
Word
Represent
words as
concept
vectors
Clicked
Web Pages
P
n
<week, day of the week, hour >,
…
Extend static
representation
with temporal
dynamics
Time Schemas
2. Temporal text-similarity
Word1
Word 2
measurement
Measure content
similarity
only during
the time
schemas
Method for computing
semantic relatedness
using the temporal
representation
Zhao et al. :Time-Dependent Semantic Similarity Measure of Queries Using Historical Click-Through Data. WWW 2006
Words Over Time (tLSA)
1. Temporal representation of text
Extend static
representation
with temporal
dynamics
2. Temporal text-similarity
measurement
CANDECOMP/
PARAFAC (CP)
Decomposition
For Tensors
Yu Wang, Eugene Agichtein: Temporal latent semantic analysis for collaboratively generated content: preliminary results. SIGIR 2011: 1145-
Documents Over Time (RHA)
Redefine term frequency (TF): a term is relatively important if it appears in the early revisio
First revision
Topology, in mathematics, is both a structure used to capture the notions of
continuity, connectedness and convergence, and the name of the branch of
mathematics which studies these.
Topology (from the Greek τόπος, “place”, and λόγος, “study”) is a major
Current version area of mathematics concerned with spatial properties that are
preserved under continuous deformations of objects, for example
…..
basic examples include compactness and connectedness
Ablimit Aji, Yu Wang, Eugene Agichtein, Evgeniy Gabrilovich: Using the past to score the present: extending term weighting models through
revision history analysis. CIKM 2010: 629-638
Documents Over Time (RHA)
Redefine term frequency (TF): a term is relatively important if it appears in the early revisio
Ablimit Aji, Yu Wang, Eugene Agichtein, Evgeniy Gabrilovich: Using the past to score the present: extending term weighting models through
revision history analysis. CIKM 2010: 629-638
Documents Over Time (time
series approach)
The temporal behavior of
1. Weak discriminators is easily described by a simple linear time series model,
2. Useful discriminators’ distribution over time is too erratic to describe faithfully
with a linear model.
Miles Efron: Linear time series models for term weighting in information retrieval. JASIST 61(7): 1299-1312 (2010)
Common Time Series Approaches:
The State Space Models
Model
For example, semi-linear state space modeling
The prediction for time t
Error at time t
State vector a time t
(inc. last point, trend, etc.)
Learn Structure and Parameters
Predict Y
J. Durbin and S. Koopman, Forecasting with Exponential Smoothing (The State Space Approach), 2008
Topics Over Time



Discretization: Slicing time-ordered data into discrete subsets:

Train globally, inspect separately [Griffiths and Steyvers, 2004]

Train and inspect separately [Wang, Mohanty and McCallum, 2005]
Being Markovian: Topic transiting at certain time stamps:

The state at time t + 1 or t + Δt is independent of all other history given
the state at time t.

State-Space model, Hidden Markov model, Kalman filters, etc. [Blei and
Lafferty, 2006]

Continuous Time Bayesian Network [Nodelman et al., 2002]
Graphical Models

Topics over Time (TOT) [Wang and McCallum, SIGKDD 2006]

PAM Over Time (PAMTOT) [Li, Wang and McCallum AAAI Workshops
2006 ]
LDA and Topics over Time (ToT)
LDA
(Sampling)
[Wang and McCallum, SIGKDD 2006]
TOT
TOT
Outline

Temporal Language Models





Temporal Information Extraction
Future Event Prediction from News




Temporal Word Representation
Temporal Document Representation
Temporal Topics Representation
Future Event Retrieval from text
Future Event Retrieval from query stream
Future Event Retrieval from social media
Temporal Summarization


Single Timeline
Multiple Timeline
Temporal Information Extraction
Feb. 04, 2013
Yesterday Holly was running a marathon when she
twisted her ankle. David had pushed her.
before
during
02032013
finishes
before
during


run
push
02042013
twist
ankle
[Mani, IJCAI Tutorial 2007]

Input: A natural language discours
Output: representation of events
and their temporal relations
Applications:
Temporal QA
 Temporal Summarization
 Temporal Expressions in Query Log

Temporal Information Extraction

Temporal entity (events and attributes)
recognition
 Knowledge-based
methods (dictionary and rules)
 ML based methods (annotated corpus)
 TimeML
 Time
Expression Recognition and Normalization
(TERN)

Temporal relations discovering
Relations – placing event on timeline
 Relative Relations – relations between events
 Absolute

Temporal reasoning
 Allen’s
Interval-Based Ontology [Allen, AI’84]
Example: Temporal Web-Mined
Rules




Lexical relations (capturing causal and other
relations, etc.)
 kill => die (always)
 push => fall (sometimes: Max fell. John pushed
him.)
Idea: leverage the distributions found in large
corpora
VerbOcean: database from ISI that contains lexical
relations mined from Google searches
 E.g., X happens before Y, where X and Y are
WordNet verbs highly associated in a corpus
Yields 4199 rules!
234
Corpora

News (newswire and broadcast)



Weblogs


TIMEX2- 95 Spanish Enthusiast dialogs, and their translations
Meetings


TIMEX2 TERN corpus (English, Chinese, Arabic – the latter with extents
only)
Dialogues


TimeML: TimeBank, AQUAINT Corpus (all English)
TIMEX2: TIDES and TERN English Corpora, Korean Corpus (200 docs),
TERN Chinese and Arabic news data (extents only)
TIMEX2 Spanish portions of UN Parallel corpus (23,000 words)
Children’s Stories

Reading Comprehension Exams from MITRE, Remedia: 120 stories, 20K
words, CBC: 259 stories, 1/3 tagged, ~50K
235
Links

TimeBank:


TimeML:


www.timeml.org
TIMEX2/TERN ACE data (English, Chinese, Arabic):


http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T08
timex2.mitre.org
TIMEX2/3 Tagger:

http://complingone.georgetown.edu/~linguist/GU_TIME_DOWNL
OAD.HTML
236
References
1.
Berrazega (2012) Temporal information extraction: A survey. International Journal on Natural Language
Computing (IJNLC)
2.
Ling, X., & Weld, D. (2010). Temporal information extraction. In Proceedings of the Association for the
Advancement of Artificial Intelligence (AAAI).
3.
Yoshikawa, K., Riedel, S., Asahara, M., & Matsumoto, Y. (2009). Jointly identifying temporal relations
with markov logic. In Proceedings of the Third International Joint Conference on Natural Language
Processing (ACL IJCNLP).
Tatu, M., & Srikanth, M. (2008). Experiments with reasoning for temporal relations between events. In
Proceedings of the International Conference on Computational Linguistics (COLING).
4.
5.
Chambers, N., Wang, S., & Jurafsky, D. (2007). Classifying temporal relations between events. In
Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) (Poster).
6.
Lapata, M., & Lascarides, A. (2006). Learning sentence-internal temporal relations. Journal of Artificial
Intelligence Research (JAIR), 27, 85–117.
Mani, I., Pustejovsky, J., and Gaizauskas, R. (eds.). (2005) The Language of Time: A Reader. Oxford
University Press.
7.
8.
Mani, I., and Schiffman, B. (2004). Temporally Anchoring and Ordering Events in News. In Pustejovsky,
J. and Gaizauskas, R. (eds), Time and Event Recognition in Natural Language. John Benjamins, to
appear.
9.
Mani, I. (2004). Recent Developments in Temporal Information Extraction. In Nicolov, N., and Mitkov, R.
Proceedings of RANLP'03, John Benjamins
Jang, S., Baldwin, J., and Mani, I. (2004). Automatic TIMEX2 Tagging of Korean News. In Mani, I.,
Pustejovsky, J., and Sundheim, B. (eds.), ACM Transactions on Asian Language Processing: Special
issue on Temporal Information Processing.
10.
11.
12.
Mani, I., Schiffman, B., and Zhang, J. (2003). Inferring Temporal Ordering of Events in News. Short
Paper. In Proceedings of the Human Language Technology Conference (HLT-NAACL'03).
Ferro, L., Mani, I., Sundheim, B. and Wilson G. (2001). TIDES Temporal Annotation Guidelines Draft Version 1.02. MITRE Technical Report MTR MTR 01W000004. McLean, Virginia: The MITRE
Outline

Temporal Language Models





Temporal Information Extraction
Future Event Prediction from News




Temporal Word Representation
Temporal Document Representation
Temporal Topics Representation
Future Event Retrieval from text
Future Event Retrieval from query stream
Future Event Retrieval from social media
Temporal Summarization


Single Timeline
Multiple Timeline
Future Event Retrieval from Text
(Textual Entailment)

A directional relation between two text fragments:
Text (t) and Hypothesis (h):
t entails h (th) if
humans reading t will infer that h is most likely true

Common Solutions
Lexical, n-gram,syntactic
semantic, global
Classifier
t,h
Similarity Features:
YES
NO
Feature vector
Androutsopoulos and Malakasiotis, JAIR’10
Glickman, Dagan, Koppel, AAAI’05
Dagan, Roth, Zanzotto, ACL’07
http://aclweb.org/aclwiki/index.php?
title=Textual_Entailment_Portal
Future Event Retrieval from Text
(Text Prediction)

Template-based Approaches [Girju and Moldovan, FLAIRS
2000]
Discover lexico-syntactic patterns that can express
the causal relation
 Validate and rank the ambiguous patterns acquired
based on semantic constraints on nouns and verbs.


Co-Occurrences Approaches [Gordon, A. S., Bejan, C. A., &
Sagae, K., AAAI 2011]
PMI Approaches on words
 Sentence Proximity in a corpus (e.g., Blogs)


Human Labeled Corpora

Framenet
Future Event Retrieval from Text
(Generalized Text Prediction)
US
1/2/1987
11:00AM
+(2h)
“NATO strikes an army base in
Baghdad”
NATO
Time-frame
Prediction
Army base
strikes
Similar
verb
Army
bombs
Military
facility
City
Actor: [state of Nato]
Property: [Hit1.1]
Theme: [Military facility]
Location: [Arab City]
Weapon
warehouse
Action
US Army
Baghdad
Generalization rule
Action
Country
Location
rdf:type
Time-frame
Past
Event
Instrument
1/2/1987
11:00AM
+(2h)
Missiles
Location
Kabul
“US Army bombs a weapon
warehouse in Kabul with
missiles”
Radinsky, Davidovich, and Markovitch. Learning causality for news events prediction,
Ontology – Linked data
http://www.linkeddata.org
Prediction Rule Generation
Time
5
Quantifier
kill
Action
Troops
Attribute
“5 Afghan troops were killed”
Afghan
1/2/1987
11:15AM
+(3h)
Timeframe
Effect
Event
Nationality
Afghanistan
1/2/1987
11:00AM
+(2h)
Type
Type
US
Army
bombs
US Army
Timeframe
“US Army bombs a weapon
warehouse in Kabul with
missiles”
Action
Country
Cause
Event
Weapon
warehouse
Location
Missiles
Country
Kabul
EffectThemeAttribute =
CauseLocationCountry
Nationality
EffectAction=kill
EffectTheme=Troops
Culturomics
How long is history remembered?
Kalev, . First Monday, 15(9), 2011.
Michel et al. , Science 2011
Yeung and Jatowt CIKM’11
Culturomics
Detecting Censorship and Suppression
Kalev, . First Monday, 15(9), 2011.
Michel et al. , Science 2011
Yeung and Jatowt CIKM’11
Culturomics
Language Evolution: Size of Lexicon,
Evolution of Grammar
Kalev, . First Monday, 15(9), 2011.
Michel et al. , Science 2011
Yeung and Jatowt CIKM’11
Culturomics
Women Rock!
Kalev, . First Monday, 15(9), 2011.
Michel et al. , Science 2011
Yeung and Jatowt CIKM’11
Future Event Retrieval using query
stream
Using query volume [Ginsberg et al., Nature
2009]
Future Event Retrieval using query
stream
Using Queries Correlations [Radinsky et al.,
WI’08]
Indication
weight on the
candidate
Today’
s salient
Future
candidate terms
terms
Likelihood
to appear
in k days
0.12
0.36
0.7
0.10
china
0.40
Gas
0.9
Weather
Storm
hurricane
Evacuation
Flood
South
Asia
War
0.85
0.30
Economics
pope
Taliban
0.05
0.01
texans
0.08
Goal: For each candidate term evaluate the probability of it to appear in the future,
given today’s terms.
Future Event Retrieval using query
stream
Using relevant documents for future event
prediction [Amodeo, Blanco, Brefeld, CIKM’ 11]
Based on publication dates of results buil
a probabilistic model
Future Event Retrieval from social
media

Predicting using Linear Regression on Chatter Rate


Predicting Using syntactic and semantic features
extracted from text and meta-text


[M. Joshi, D. Das, K. Gimpel, and N. A. Smith. Movie reviews
and revenues: An experiment in text regression. In In Proc. of
NAACL-HLT, 2010.]
Predicting using Sentiment Analysis



[S. Asur and B. A. Huberman. Predicting the future with social
media, 2010.]
[S. Asur and B. A. Huberman. Predicting the future with social
media, 2010.]
G. Mishne. Predicting movie sales from blogger sentiment. In
In AAAI Spring Symposium, 2006.
Predict future posts

Using trending topic modeling and historical data [Wang,
Agichtein and Benzi KDD’12]
Outline

Temporal Language Models





Temporal Information Extraction
Future Event Prediction from News




Temporal Word Representation
Temporal Document Representation
Temporal Topics Representation
Future Event Retrieval from text
Future Event Retrieval from query stream
Future Event Retrieval from social media
Temporal Summarization


Single Timeline
Multiple Timeline
Temporal Summarization

Topic detection and tracking (TDT)
Lexical similarity, temporal proximity, query relevance,
clustering techniques, etc.
[Allan 02; Allan, Carbonell, Doddington, Yamron, Yang
98;
Yang, Pierce, Carbonell SIGIR’98 ;J. Zhang, Yang,
Ghahramani,
NIPS’04.]
 Named entities, data or place information, domain
knowledge
[Kumaran and Allan SIGIR’04]


Temporal Summarization/ Storylines

Not seek to cluster “topics” like in TDT but to utilize
evolutionary correlations of news coherence/diversity for
summarization
[Yan and Zhang SIGIR’11; Shahaf and Guestrin, KDD 2010;
Shahaf, Guestrin, Horvitz, WWW 2012; Allan, Gupta, and
Khandelwal, SIGIR’01]
Storyline Construction
Chieu and Lee SIGIR’04
Good Story Chain (Coherence)
Incoherent: Each
pair shares
different words
[Shahafand Guestrin, KDD 2010]
Coherent: a
small number of
words captures
the story
Good Story Chain (Word
Influence)
Take into consideration the influence of document di to di+1
through the word w. High if:
(1) the two documents are highly connected, and (2) w is important for the connectivity
Linear
Programing
Problems
[Shahafand Guestrin, KDD 2010]
Good Multiple Story Chains
Consider all coherent maps with maximum possible coverage. Find the most connect
Clinton set
for Dublin
Query:
Clinton
Clinton, Religious
Leaders Share
Thoughts
Clinton
visits
Belfast
High hopes for
Clinton visit
Church Leaders Religion Leaders
Praise Clinton's Divided on Clinton
Moral Issue
'Spirituality'
Clinton Should
Resign, 2
Religious Leaders
Say
Shahaf, Guestrin, Horvitz: Trains of thought: generating information maps. WWW
Good Multiple Story Chains
Documents D
1. Coherence graph G
2. Coverage function f
f(
…
3. Increase
Connectivity
Encodes all
m-coherent
chains as
graph paths
)=?
Submodular
orienteering
[Chekuri & Pal,
2005]
Quasipoly time
recursive greedy
O(log OPT)
approximation
Shahaf, Guestrin, Horvitz: Trains of thought: generating information maps. WWW
Timelines With Images
Wang, Li, Ogihara. AAAI’12
Timelines with Images
Wang, Li, Ogihara. AAAI’12
Online Timeline creation

A. Ahmed, Q. Ho, J. Eisenstein, E. Xing, A. J. Smola, and C.
H. Teo. Unified analysis of streaming news. In Proc. of WWW,
2011.

J. Kleinberg. Bursty and hierarchical structure in streams. In
KDD, 2002.

J. Kleinberg. Temporal dynamics of on-line information
systems. Data Stream Management: Processing High-Speed
Data Streams. Springer, 2006.

L. Yao, D. Mimno, and A. McCallum. Efficient methods for
topic model inference on streaming document collections. In
KDD, pages 937–946, 2009.
Online Clustering Model:
Recurrent Chinese Restaurant
Process
A. Ahmed et al. WWW, 2011.
Topic Model: LDA (reminder)
Online Storyline Model
Inference: Particle Filtering
Time-sensitive Search & Recommendation
WSDM 2013 Tutorial
Outline

Modeling Dynamics
 Web
content dynamics [Susan]
 Web user behavior dynamics [Milad]
 Spatio-temporal Analysis [Fernando]
 Methods for evaluation

Applications to Information Retrieval
 NLP
[Kira]
 News event prediction [Kira]
 Time-sensitive search [Dong/Chang]
 Recommendations [Dong/Chang]
Outline

Time-sensitive search
 Time-sensitive
ranking relevance
 Time-sensitive query suggestion
 Federated search

Time-sensitive recommendation
SERP
Applications on
Search
Postsubmit
Rankin
g
(1)
Presubmit
Query
suggestion
(2)
Federated
search (3)
Portal
Applications on
Recommendatio
n
Outline (Anlei Dong and Yi Chang)

Time-sensitive search
 Time-sensitive
ranking relevance
 Time-sensitive query suggestion
 Federated search

Time-sensitive recommendation
Applications of Time-Sensitive
Ranking



Also called time-aware ranking, recency
ranking
Web search
Vertical search
 News
search
 Video search
 Blog search
 E-commerce search
… …
Problem

Ranking relevance
 Topical
relevance
 Authority/popularity/Spam
 Freshness
 Local
 Revenue
… …

Traditional
relevance
How to appropriately combine these factors?
 Freshness
+ other relevance
Outline for Time-Sensitive
Ranking Relevance





Rule-based approaches
A learning-to-rank practice
Leverage Twitter data for improvement
Joint optimization for relevance and freshness
Further study: user behavior data
Yearly Recurrent queries


“WSDM”, “SIGIR”, “Christmas”, “Black Friday”, etc
Possible solution: query re-writing


Solution 1: by query expansion
 For example, from query “sigir” to “sigir 2009” but
 Will change query intention, and
 www.sigir.org better than www.sigir2009.org
Solution 2: Double search
 Use original query, sigir, search first
 Use query expansion, sigir 2009, search second
 Then blending two results. BUT
 Capacity problem and blending algorithm
Another Simple Formula

Combine relevance and freshness by a
heuristic rule
 exponential
time-decay rule:
e.g., [Del Corso, WWW2005]

Advantage
 Little
training data; fast product delivery;
 Reasonably good ranking result in practice

Disadvantage
 Far
from optimal
Learning-to-Rank Solution


Learning-to-rank: please check the tutorial [Liu WWW09]
A standard approach
Algorithm
Learning
to
rank
Data
Feature
Main Challenges

Feature Challenges
 Precise
time-stamp for each URL is hard to get
 Little click information for a fresh URL
 Few anchor texts for a fresh URL

Data Challenges
 Crawling
Challenge
 Labeled data collection challenge
 Appropriate evaluation metrics

Ranking Algorithm Challenges
 Traditional
Ranking is poor, since fresh documents
lack link or click information
 Merge different sources of results into 1 ranking
Data: Editorial Label

Traditional data label:


Incorporate time:


<query, URL>  ? {perfect, excellent, good, fair, bad}
<query, URL, query_time>
 relevance ? {perfect, excellent, good, fair, bad}
 freshness ?{latest, ok, a little bit old, totally outdated}
Learning target:
Combine labels by relevance and freshness
 For example: recency promotion/demotion: {+1, 0, -1, -2}

[Dong, WSDM01]
Freshness: Judge vs. Age

Subjective vs. objective
Data: Editorial Data Collection

Need to collect data periodically
 Avoid
distribution bias
 Judge immediately
Feature
An ideal case:
age
publish time
query issue time
t pages do not have an accurate time!
But most

Some intuitive features
 Timestamp
feature
 Discovery time feature
 Query time-sensitivity feature
 Page classification feature
Click Feature


Challenge: limited clicks on fresh URLs
Solution:
 User
may issue a chain of queries for the same
information: queries in the chain are strongly
related.
 Use query chains to “smooth” clicks.
[Inagaki AAAI10]
Extend Clicks
URLs in first page
Queries
www.ringling.com
…
en.wikipedia.org/wiki/Circus
circus
circus album
click
Britney Spears album
click
www.youtube.com/watch?v=1zeR3NSYcHk
…
en.wikipedia.org/wiki/Circus_(Britney_Spears_album)
en.wikipedia.org/wiki/Circus_(Britney_Spears_album
…
www.metrolyrics.com/circus-lyrics-britney-spears.html
…
britneyspearscircus.net/
Solid arrows : real clicks
Dotted arrows: inferred clicks from query chain
Time-Weighted Click Features

Recent clicks must be weighted more
The shift of user intent must be taken into
consideration
 e.g., should we still rank B. Spears’ “Circus” on the top
for the query “Circus” after 12 months?


Time-weighted CTR

i refers to day; x is used to control time decay
CTR w (q, u , t q

)

tq
c (1  x)
i t q
v (1  x)
i t q
i 1,vi  0 i
tq
i 1,vi  0 i
Click Buzz Feature

CTR change over time
 Compute average CTRavg over a period of time
and standard deviation σ
 BUZZ at a given day is

(CTRt – CTRavg ) / σ
 Represent
how unusual the current CTR is with
respect to “normal” CTR for that URL.
Modeling: Leverage Regular Data

Premise of improving recency


Recency training data


small amount of query-urls -> Poor relevance
Regular training data


Overall relevance should not be hurt!
huge amount of query-urls -> Good relevance
Solution

Utilize regular data or model to help recency ranking
Combine Relevance and Recency
Data
Data
Features
Modeling algorithm
Dedicated model
Recency data
Recency features +
regular features
GBrank
Over-weighting
model
Recency data +
Regular data
Recency features +
regular features
GBrank
Compositional
model
Recency data
Recency features +
ranking score
GBrank
Adaptation model
Recency data
Recency features +
regular features
1. Regular model
as base model
2. Do adaptation
[Dong WSDM10]
Model Adaptation
Motivation: solve data scalability issues
expensive to have high quality training data for each market/tas
Background:
• Model adaptation is one approach of transfer learning
• Goal: transfer knowledge learned from task (A)  task (B)
• Assumption: there is similarity between A and B
Approach:
• Train a base model A (using Data A)
• Modify model A using Data B  Model A’
• Apply adapted model A’ to task B
Online Over-Weighting Results
12
8
6
1
5
6
nodemote
demote
8
DCG gain (%)
10
DCG gain (%)
10
nodemote
demote
4
2
2
0
4
Day1
Day2
Day3
DCG1
Day123
0
Day1
Day2
Day3
DCG5
Day123
Query Classification vs. Query
feature

Approach 1: query classification
 Step1.
determine query type;
 Breaking-news
query? Yearly-recurrent query? … …
 Step
2. apply corresponding ranking model
 Divide-and-conquer strategy
 Effective and straightforward in practice

Approach 2: query feature
A
single unified model for all queries
 E.g. [Dai SIGIR11]
Query Classifier

Identify, in near real-time, queries about
emerging events and news stories
 E.g.,
natural disasters; major sport events; latest
celebrity gossip; political breaking stories; etc.
Query Classifier

Standard approach:
 Maintain
temporal model
for each query
 Identify irregularities in model
e.g., change in moving

average of more than nσ
work well for head queries,
not so for torso/tail queries
One New Approach

Rather than maintain a model for each query,
maintain a model of each slot of time

Given a query, determine whether it is predicted
by recent models better than by earlier ones

In practice:


Time slot modeling: n-gram language models
Model prediction: language model generation
likelihood
Compute “Buzziness”


Approach

Reference models, ri = {prev_day, prev_week,
prev_month}

Language model settings: interpolated bigram
model
Score computation, using Query model
Content Model


Not all current events reflected in the query log
In addition to tracking the query log, we track
news headlines from Yahoo! News



Top viewed: U.S., Business, World, …
RSS feeds updated every 30 minutes
Content used for building similar time-slotted LMs
Score, blending Query and Content models:
Data Blending
Results from 2+ scoring functions
WEB
Index
Ranking
RealTime
Index
Ranking
NewsImg
Index
Ranking
ORGANIC
RESULTS
Single organic result list that maximize
relevance
Yahoo! Confidential
Incorporate Twitter Data to Improve
Real-Time Web Search

To improve Web Search Ranking, not Twitter Search

Micro-blogging

Twitter
 Tweet
 Twitter User
 Twitter Tiny URL
 (Twitter URL)
 Following Relationship
[Dong WWW10]
Question

Can we make use of Twitter to improve realtime crawling?

Can we utilize Tweets to improve Twitter Tiny
URL ranking?

Can we use social network of Twitter users to
improve Twitter Tiny URL ranking?
Motivation

Twitter Tiny URL contains news/non-news URL,
and Twitter Tiny URL could represents diverse
and dynamic browsing priority of users;

The social network among Twitter users data
could provide a method to compute popularity of
twitter users, and authority of fresh documents;

Tweets could be leveraged as an extended
representation of Twitter Tiny URL;
Crawling Strategy




Exhaustive crawling strategy for fresh content in
real-time is difficult;
Select high quality Twitter Tiny URL as crawling
feeds;
Twitter Tiny URL could reflect diverse and
dynamic browsing priority of users;
Human intelligence is incorporated into the realtime crawling/indexing system.
Crawl Twitter Tiny URL



Majority of Twitter Tiny URL are poor quality
 Spam, Adult, Self-promotion, etc.
A set of simple heuristic rules
 Discard Tiny URL referred by the same Twitter user
more than 2 times;
 Discard Tiny URL only referred by one Twitter user.
Experiment
 Based on 5 hour twitter data,
 about 1 Million Tiny URL,
 After filtering with the rule, 5.9% high quality Tiny
URL remaining.
Twitter Feature

Text Matching between Query and Tweet
 Cosine
Similarity
 Exact Matching
 Proximity Matching
 Overlapping Terms
 Extra Terms
 Missing Terms
 User Authority Weighted Proximity Matching
Textual Features between
Query and Tweet

Tweets would be a substitute of Anchor Text in
real-time.
Social Network Features

Represent Twitter User as a social network
A
Vertex represents a Twitter User
 An Edge represents the follower relationship
 Apply the PageRank idea
 The popularity of Twitter Users are generated
when it converge.
 The popularity information is used to update
User Authority Weighted Proximity Matching.
Other Features

Given a Tiny URL, other URL based features
include:
 Average
Count Features of the users refer the Tiny
URL;
 Count Features related to the 1st Twitter user refer to
the Tiny URL;
 Count Features related to the most popular Twitter
User refer the Tiny URL
 Count Features
 # of followers for this user;
 # of followings for this user;
 # of posts by this user;
 # of users retweet the Tiny URL;
 # of users reply the Tiny URL;
Ranking Strategy
Data
Features
MLR for Regular
URLs
Regular
data
MLR for Twitter
URLs
Twitter
(Regular)
data
Content features
+ Aggregate
Features
Content features
+ Twitter features
Different Ranking Models

MLR Model is trained with Gradient Boosted
Decision Tree (GBDT) Algorithm.
Rationale of Each Model
MLR + Blending
Advantage & Disadvantage
For
For Twitter
Regular
URL
URL
MRegular MRegular Favor regular URL, unfavor Twitter URL
MContent MContent
Favor Twitter Tiny URL, unfavor regular
URL
Twitter Tiny URL will not get promoted
MRegular
MRegular
MContent
MTwitter
MRegular
MComposi Tiny URL will be promoted, but
te
relevance of Tiny URL might be
Tiny URL will be promoted, but
relevance of Tiny URL might not be fully
leveraged
Ranking Result
MLR + Blending
NDCG5
NDCF5
NDCG5
Regular URL
Twitter URL
MRegular
MRegular
0.681
0.518
0.666
MContent
MContent
0.682 (+0.3%)
0.587 (+11.7%)
0.652 (-2.1%)
MRegular
MContent
0.690 (+1.3%)
0.569 (+8.9%)
0.680 (+2.1%)
MRegular
MTwitter
0.729 (+6.5%)
0.736 (+29.6%)
0.739 (+9.9%)
MRegular
MComposite
0.723 (+5.8%)
0.756 (+31.4%)
0.735 (+9.4%)
+ Recency
Demotion
Main Findings

Twitter did contain high quality Tiny URL, which
is relevant to some time sensitive queries;

The text of Tweets can be used to substitute
anchor text for those real-time relevant
documents;

The social network of Twitter users can be used
to improve ranking.
Simultaneously Optimize
Freshness and Relevance


[Dai SIGIR11]
Criteria-sensitive divide-and-conquer ranking
 Multiple
rankers corresponding to different query
categories
 Train each ranker by
Q: training query set;
I(q, i): importance of query q with respect to the ith
ranked model
Study User Behavior

Relevance
 Topical
relatedness
 Metric: tf*idf, BM25, Language Model

Freshness
 Temporal
closeness
 Metric: age, elapsed time

Trade-off
 Serve
for user’s information need
Understand User’s Information
Need

User’s emphasis on relevance/freshness
varies
 Breaking
news queries
 Prefer latest news reports – freshness driven
 E.g., “apple company”
 Newsworthy queries
 Prefer high coverage and authority news
reports – relevance driven
 E.g., “bin laden death”
Relevance/Freshness Varies
Breaking news
queries
Newsworthy queries
[Wang WWW10]
Access User’s Information
Need

Unsupervised integration [Efron SIGIR11, Li CIKM03]
 Limited

on timestamps
Editor’s judgment [Dong WSDM10, Dai SIGIR11]
 Expensive
for timely annotation
 Inadequate to recover end-user’s information
need
Editor’s Annotation

Freshness-demoted relevance
 Rule-based
hard demotion [Dong WSDM10]
 E.g.,
if the result is somewhat outdated, it should be
demoted by one grade (e.g., from excellent to good)
Correlation:
0.5764±0.6401
Joint Relevance and Freshness
Learning
•
JRFL: (Relevance, Freshness) -> Click
Query => trade-off
URL =>
relevance/freshness
Click => overall
impression
Joint Relevance and Freshness
Learning

Model formalization
Query-specific
Latent
Joint Relevance and Freshness
Learning

Linear instantiation
 Associative
property
 Relevance/Freshness
 Query
model learning
model learning
Temporal Features

URL freshness features
 Identify
freshness from content analysis
Temporal Features

Query freshness features
 Capture
latent preference
Experiments

Data sets
 Two
months’ Yahoo! News Search sessions
 Normal
bucket: top 10 positions
 Random bucket [Li 2011]


Randomly shuffled top 4 positions
Unbiased evaluation corpus
 Editor’s
judgment: 1 day’s query log
 Preference
pair selection [Joachims SIGIR05]
 Click
> Skip above
 Click > Skip next
 Ordered by Pearson’s
value
Analysis of JRFL

Relevance and Freshness Learning
 Baseline:
GBRank trained on Dong et al.’s
relevance/freshness annotation set
 Testing corpus: editor’s one day annotation set
Query Weight Analysis
Quantitative Comparison

Ranking performance
 Random
bucket clicks
Quantitative Comparison

Ranking performance
 Normal
clicks
Quantitative Comparison

Ranking performance
 Editorial
annotations
CTR distribution revisit
Correlation:
0.7163±0.1673
Summary

Joint Relevance and Freshness Learning
 Query-specific
preference
 Learning from query logs
 Temporal features

Future work
 Personalized
 Broad
retrieval
spectral of user’s information need
 E.g., trustworthiness, opinion
Refs

[Del Corso WWW05] Gianna M. Del Corso, Antonio Gulli, Francesco Romani: Ranking a
stream of news. WWW 2005: 97-106

[Liu WWW09] Tie-Yan Liu: Tutorial on learning to rank for information retrieval. WWW 2009

[Dong WSDM10] Anlei Dong, Yi Chang, Zhaohui Zheng, Gilad Mishne, Jing Bai, Ruiqiang
Zhang, Karolina Buchner, Ciya Liao, Fernando Diaz: Towards recency ranking in web search.
WSDM 2010: 11-20

[Inagaki AAAI10] Yoshiyuki Inagaki, Narayanan Sadagopan, Georges Dupret, Anlei Dong, Ciya
Liao, Yi Chang, Zhaohui Zheng: Session Based Click Features for Recency Ranking. AAAI
2010

[Dong WWW10] Anlei Dong, Ruiqiang Zhang, Pranam Kolari, Jing Bai, Fernando Diaz, Yi
Chang, Zhaohui Zheng, Hongyuan Zha: Time is of the essence: improving recency ranking
using Twitter data. WWW 2010: 331-340

[Zhang EMNLP10] Ruiqiang Zhang, Yuki Konda, Anlei Dong, Pranam Kolari, Yi Chang,
Zhaohui Zheng: Learning Recurrent Event Queries for Web Search. EMNLP 2010: 1129-1139

[Chang SIGIR12] Po-Tzu Chang, Yen-Chieh Huang, Cheng-Lun Yang, Shou-De Lin, Pu-Jen
Cheng: Learning-based time-sensitive re-ranking for web search. SIGIR 2012: 1101-1102

[Kanhabua CIKM12] Nattiya Kanhabua, Kjetil Nørvåg: Learning to rank search results for timesensitive queries. CIKM 2012: 2463-2466
Refs






[Wang WWW12] Hongning Wang, Anlei Dong, Lihong Li, Yi Chang,
Evgeniy Gabrilovich: Joint relevance and freshness learning from
clickthroughs for news search. WWW 2012: 579-588
[Dai SIGIR11] Na Dai, Milad Shokouhi, Brian D. Davison: Learning
to rank for freshness and relevance. SIGIR 2011: 95-104
[Efron SIGIR11] M. Efron and G. Golovchinsky. Estimation methods
for ranking recent information. In SIGIR, pages 495–504, 2011.
[Li CIKM03] X. Li and W. Croft. Time-based language models. In
CIKM, pages 469–475, 2003.
[Li WSDM11] L. Li, W. Chu, J. Langford, and X. Wang. Unbiased
offline evaluation of contextual-bandit-based news article
recommendation algorithms. In Proceedings of ACM WSDM '11,
pages 297–306, 2011.
[Joachims SIGIR05] T. Joachims, L. Granka, B. Pan, H.
Hembrooke, and G. Gay. Accurately interpreting clickthrough data
as implicit feedback. In SIGIR, pages 154–161, 2005.
Outline

Time-sensitive search
 Time-sensitive
ranking relevance
 Federated search

Time-sensitive recommendation
Federated Search


In web search engine results
To integrate vertical search engine results
 News
 Local
 Shopping
 Finance
 Movie
 Travel
…

…
Also called DD (direct display)
News DD
News DD
Critical Challenge

Understand query intent and surface relevant
content
 When
to trigger DD?
 Where to show the DD?
 Maximize user satisfaction subject to business
constrains
Proxy for User Satisfaction

Strong correlation: CTR & newsworthiness
 [Diaz
WSDM09]
 Editors label queries for newsworthiness
 Check the correlation between CTR & labeling

So user click info can represent query’s
newsworthiness
Applicability of Existing
Approaches

Web document ranking?
 CTR
is not correlated with query-document
relevance

Query classification?
 Buzzy

Online model?
 No

words change rapidly
initial CTR data
Human labeling is very difficult (if not
impossible)
Approach by Konig et al.
[Konig SIGIR09]

Data sources for feature computation
 News
corpus
 Blog corpus
 Wikipedia corpus
… …

7-day’s data corpus window
 Small


enough for main memory use
News and Blog complement each other
Wikipedia is background corpus
Features

Corpus frequency features
 frequency
of documents matching the query
 Frequency difference
 Based on news article title and full text
 tf-idf method for query term salience

Context features
 Breaking
news query usually surfaces similar
documents
 On the other hand, “NY Times” return different
stories
 Compute the coherence of returned documents
Features

Query-only features
 Ratio
of stop words to query length in tokens
 Ratio of special characters
 E.g.,
 Ratio
www.google.com
of capitalization terms
 Check

if query terms are capitalized in news corpus
E.g., “Casey Anthony”
Leverage Click Feedback

[Diaz WSDM09]
CTR can be estimated simply by

But

 Samples
are sparse especially at initial stage
 Click probability is changing over time

Therefore we need initial guess
Incorporate Prior Estimation into
Click Feedback


Posterior mean:
: prior estimation
Small μ: sensitive to early user
feedback
Large clicks/views
μ: rely more from
on prior
estimation
Aggregate
similar
queries
: query similarity
Features for Prior Estimation
Click Precision and Recall
Baseline: contextual model (prior mean)
Training: use click feedback
Scalability

Many different verticals
 News,
Shopping, Local, Finance, Movie, Travel, …
 [Arguello SiGIR09] more features

Many different markets
 US,

CA, UK, FR, TW, HK, … …
Need a system that can be applied to all different
verticals with minimal effort.
 Automatic
data generation
 Automatic feature generation
 Automatic model training/evaluation
 Not
rely on editorial data at all
Exploration


Uniform Random Exploration over the set of
available choices (“actions”)
Action = Slotting Decision = Slot DD ‘v’ at slot ‘s’
where
v
in V = set of all legally available DDs.
 s in S = set of all legally available slots for v, may
include NONE.

Features are logged at the same time.
Generating Data

Thus each event in the data is a 4-tuple
(a, p, x, r)
 a:
Result slotted
 x: Feature vector
 r: Observed reward
 p: Probability of action, Pr([email protected])
Features

Query features
›
›

Corpus / Vertical level features:
›

Lexical Features - Bag of words, bigrams, cooccurrence stats, etc.
Query attributes - query classification, length, etc.
Query independent historical CTRs, User preferences
etc.
Post-retrieval features
›
›
Query-Document match features (ranking scores and
features)
Global result set features
Summary

We have introduced
 Two
classical papers on news federation search
 Scalability issue

More issues
 False
positive will hurt user experience badly
 More features
Refs







[Arguello SIGIR09] Jaime Arguello, Fernando Diaz, Jamie Callan, JeanFrancois Crespo: Sources of evidence for vertical selection. SIGIR 2009:
315-322
[Diaz WSDM09] Fernando Diaz: Integration of news content into web results.
WSDM 2009: 182-191
[Konig SIGIR09] A. Konig, M. Gamon, and Q. Wu. Click-through prediction
for news queries. In Proc. of SIGIR, 2009
[Kumar WSDM11] Ashok Kumar Ponnuswami, Kumaresh Pattabiraman,
Qiang Wu, Ran Gilad-Bachrach, Tapas Kanungo: On composition of a
federated web search result page: using online users to provide pairwise
preference for heterogeneous verticals. WSDM 2011: 715-724
[Kumar WWW11] Ashok Kumar Ponnuswami, Kumaresh Pattabiraman,
Desmond Brand, Tapas Kanungo: Model characterization curves for
federated search using click-logs: predicting user engagement metrics for
the span of feasible operating points. WWW 2011: 67-76
[Arguello CIMK12] Jaime Arguello, Robert Capra: The effect of aggregated
search coherence on search behavior. CIKM 2012: 1293-1302
[Chen WSDM12] Danqi Chen, Weizhu Chen, Haixun Wang, Zheng Chen,
Qiang Yang: Beyond ten blue links: enabling user click modeling in federated
web search. WSDM 2012: 463-472
Outline

Time-sensitive search
 Time-sensitive
ranking relevance
 Time-sensitive query suggestion
 Federated search

Time-sensitive recommendation
Web Recommender Systems

Recommend items to users to maximize
some objective(s)
Outline for Recommendation





Introduction
Personalization
User segmentation
Action interpretation
Pairwise preference modeling
Portal
Applications on
Recommendatio
n
Scientific Discipline
 Machine
Learning & Statistics (for learning useritem affinity)
 Offline Models
 Online Models
 Collaborative Filtering
 Explore/Exploit
(bandit problems)
 Multi-Objective
 Click-rates
 User
Optimization
(CTR), time-spent, revenue
Understanding
 User
profile construction
 Content
 Topics,
Understanding
categories, entities, breaking news,…
Some Refs on Previous
Research

Shuang-Hong Yang, Bo Long, Alexander J. Smola, Hongyuan Zha, Zhaohui
Zheng: Collaborative competitive filtering: learning recommender using context of
user choice. SIGIR 2011: 295-304

Lihong Li, Wei Chu, John Langford, Xuanhui Wang: Unbiased offline evaluation of
contextual-bandit-based news article recommendation algorithms. WSDM 2011:
297-306

Wei Chu, Seung-Taek Park: Personalized recommendation on dynamic content
using predictive bilinear models. WWW 2009: 691-700

Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, Xuanhui Wang:
Personalized click shaping through lagrangian duality for online recommendation.
SIGIR 2012: 485-494

Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, Xuanhui Wang: Click
shaping to optimize multiple objectives. KDD 2011: 132-140

Deepak Agarwal, Bee-Chung Chen, Bo Long: Localized factor models for multicontext recommendation. KDD 2011: 609-617

Deepak Agarwal, Bee-Chung Chen: fLDA: matrix factorization through latent
dirichlet allocation. WSDM 2010: 91-100
CTR Curves for Dynamic Items
Each curve is the CTR of an item in the Today Module on www.yahoo.com over time
Traffic obtained from a controlled experiment
Things to note:
(a) Short lifetimes, (b) temporal effects, (c) often breaking news stories
Solutions

Online learning
 Content
and user interest change fast
 Offline model cannot capture all of the variations
 Large amount of user traffic make it possible

Personalization
 More
relevant to different users
Online Learning


Ranking model: updated every 5 minutes on
users’ feedbacks
Exploration & Exploitation
 Random
bucket (small traffic) for exploration:
randomly shuffle the ranking of all
candidates
 Serving bucket for exploitation:
models -> scores -> ranking
Online Learning Flowchart
Per-Item Model



Each item has a corresponding model.
For example, estimated most popular (EMP)
model
Click probability
where
is sample size.
Outline for Recommendation





Introduction
Personalization
User segmentation
Action interpretation
Pairwise preference modeling
Personalization
Gender
CTR
Query Category
Gender
CTR
Female
0.24
Family
Female
0.34
Male
0.39
Family
Male
0.32
Sports
Female
0.16
Sports
Male
0.37
Tech and Gadgets
Female
0.21
Tech and Gadgets
Male
0.44
Query
DMA with highest
CTR
SF Giants
San FranciscoOakland-San Jose
Oregon vs. UCLA
Portland
Texas Rangers
Dallas-Ft. Worth
Age
CTRs are relative
Personalization Model (I)

User segmentation
 Pre-define
a few user segments by user features
(e.g., age-gender)
 For each user segment
apply EMP
Personalization Model (II)

Online logistic regression (OLR)
: intercept term, represent most popular score
,, ,
, … : feature weights
,
,
, … : binary user features
Trending Now Module: Query
Recommendation
Query Buzz Computation



ngram based
uses LM scores based on search queries, queries
triggering News DD, and news headlines
computes the likelihood of the ngrams in a query for:
 the last hour/window
Model for
 the same hour/window in the previous day
current
 the same hour/window in the previous week hour
 the same hour/window in the previous month
LM (_W_)
W
W
Same hour,
previous
day
LM (_W_)
Feb. 6
Feb. 7
Feb. 8
GEO Feature [Bawab KDD12]
query based
 uses the queries in the TimeSense
dictionary
 aggregates local counts on a fixed window
Model for
current
W = 24 hrs
of 24 hrs
hour

Feb. 6
Feb. 7
Feb. 8
GEO Capabilities

DMA: Designated
Market Area
(Nielsen)

Top 50 US DMAs

Log data contains
the WOEID/DMA
for each query
DMA
Query Count
GEO Model

Entropy of query over DMAs:

Posterior probability, normalizes across
DMAs, favors larger ones:
Time-Sensitive vs. GeoSensitive
Examples (Buzzy and Local)
Query
Count Buzz Entropy Top DMA nProb
ringwood nj
murder
67 0.7024
0.8546
New York = 0.84, Philadelphia = 0.06
tom torlakson
73 0.8506
2.3704
Los Angeles = 0.15, San Fran = 0.16,
Sacramento = 0.36, San Diego = 0.21
justice jorge
labarga
66 0.7014
2.4733
Miami = 0.19, Tampa = 0.17,
Orlando = 0.26, Jacksonville = 0.29
gulf coast claims
facility
626 0.5037
1.1892
New Orleans = 0.86
drew brees baby
312 0.4068
0.9781
New Orleans = 0.89
Outline for Recommendation





Introduction
Personalization
User segmentation
Action interpretation
Pairwise preference modeling
User Segmentation

Baseline – heuristic rule
 E.g.,

by age-gender
User behavior information can better reflect
users’ interests
 Users
with similar behavior patterns are more likely
to have similar interests
 Describing user behaviors:
 Behavior Targeting (BT) features
Action Interpretation for User
Segmentation

User Segmentation:
 Use
selected features to describe each user
 Apply clustering methods:
K-means
 Tensor segmentation [Chu KDD09]

[Bian TKDE]
Tensor Segmentation Result
Offline Evaluation
 Editorial
judge is infeasible
 The correlation between actual clicks and
prediction rankings
Precision1 = 1
Precision2 = 1
Precision3 = 1
…
Precision1 = 0
Precision2 = 0
Precision3 = 1
…
Compare User Segmentation
Approaches
Outline for Recommendation





Introduction
Personalization
User segmentation
Action interpretation
Pairwise preference modeling
Action Interpretation for Online
Learning


User is not engaged in every module
Three event categories
 Click
event
 user clicked one or more items in the certain module
– useful
 Click-other event
 contains at least one user action on other modules –
not useful
 Non-click event
 user has no click action on any module
 not obvious to determine if the user examine the
module
 we can check user’s historic behaviors on this module
User Engagement on Non-Click
Events
Remove Click-Other Events
Outline for Recommendation





Introduction
Personalization
User segmentation
Action interpretation
Pairwise preference modeling
Pairwise Preference Learning


Reality: multiple items displayed at one time
In one event:
a user
 Per-item
click
no click
Item A
Item B
model interpretation:
“Item A was clicked once, Item B was viewed-only
once.”
 Preference
interpretation:
“the user liked Item A better than Item B.”
[Bian TIST]
Another Example
User 1
User 2

click
no click
Item A
Item B
click
no click
Item C
Item D
By per-item model
CTR(A) = 1; CTR(B) = 0; CTR(C ) = 1;
CTR(D) = 0.
A=C>B=D
(wrong due to limited
observations)

Facts are only:
Learning Sample Sparsity
 Many
users never really examine the
module;
 Candidate pool size >> display number;
 Personalization: makes it even worse
Our Approach for Sample
Sparsity

Use pair-wise preferences for learning
 Can
better deal with sparse problem
 More straightforward way for final ranking
 A proven effective approach in search ranking
problem.

Two algorithms
 Graph-based
pairwise learning
 Bayesian pairwise learning
Preference Extraction
User 1
click
no click
no click
no click
Item A
Item B
Item C
Item D
Preferences: A > B; A > C;
User 2
A > D.
no click
click
click
no click
Item D
Item C
Item A
Item B
Preferences: C > D; C > B;
A > D; A > B.
Graph-Based Pairwise Learning

Borrow PageRank idea
 Preferences:
A > B; A > C; A > D.
 Preferences: C > D; C > B;
A > D; A > B.
A
1
B
A
2
B
1
1
C
1
1
D
2
1
C
D
Bayesian Pairwise Learning
item attractiveness/relevance
observed preference strength
Bayesian hidden score (BHS) model
• Preference distribution:
• Attractiveness distribution:
Model Optimization

Likelihood function

Final task

Optimization:
 Stochastic
gradient descent algorithm
Sample Sparsity Effect


Trending Now data
Removing learning
samples, compare:
 Per-item
model decline
 Preference model
decline

Conclusion
 The
fewer samples, the
more effective the
preference learning
approach
Summary

We have introduced
 Time-sensitive
+ geo sensitive
 User segmentation
 Action interpretation
 Pair-wise learning

We have NOT introduced
 Many

failed efforts
Many Lessons
 Appropriate
features and sampling are extremely
critical in practice
Refs



[Bian TKDE] Jiang Bian, Anlei Dong, Xiaofeng He, Srihari
Reddy, Yi Chang: User action interpretation for personalized
content optimization in recommender systems. IEEE
Transactions on Knowledge and Data Engineering, to appear.
[Bawab KDD12] Ziad Al Bawab, George H. Mills, JeanFrancois Crespo: Finding trending local topics in search
queries for personalization of a recommendation system.
KDD 2012: 397-405
[Bian TIST] Jiang Bian, Bo Long, Lihong Li, Anlei Dong, Yi
Chang, Exploiting User Preferences for Online Learning in
Recommender Systems, submitted to ACM Transactions on
Intelligent Systems and Technology (TIST)
Summary & Resources
WSDM 2013 Tutorial
Summary and Other Venue

Wikipedia Page


Workshops









http://en.wikipedia.org/wiki/Temporal_information_retrieval
TempWeb WWW’13
International Workshop on Big Data Analytics for the Temporal Web (2012)
Time-Aware Information Access (TAIA) associated to SIGIR’12
Temporal Web Analytics Workshop associated to WWW2011
TERQAS (Time and Event Recognition for Question Answering Systems)
workshops
Workshop on Web Search Result Summarization and Presentation associated to
WWW2009
Workshop on Temporal Data Mining associated to ICDM2005
Workshop on Text Mining associated to KDD2000
TREC


Temporal Summarization Track
Microblog Track
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement