Artificial Intelligence Towards mobile intelligence: Learning from GPS history data for

Artificial Intelligence Towards mobile intelligence: Learning from GPS history data for
Artificial Intelligence 184–185 (2012) 17–37
Contents lists available at SciVerse ScienceDirect
Artificial Intelligence
www.elsevier.com/locate/artint
Towards mobile intelligence: Learning from GPS history data for
collaborative recommendation
Vincent W. Zheng a,∗ , Yu Zheng b , Xing Xie b , Qiang Yang a
a
b
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Microsoft Research Asia, Building 2, No. 5 Danling Street, Haidian District, Beijing 100080, PR China
a r t i c l e
i n f o
a b s t r a c t
Article history:
Received 3 January 2011
Received in revised form 18 February 2012
Accepted 20 February 2012
Available online 21 February 2012
Keywords:
GPS
Location
Activity
Mobile recommendation
Collaborative filtering
Personalization
With the increasing popularity of location-based services, we have accumulated a lot of
location data on the Web. In this paper, we are interested in answering two popular
location-related queries in our daily life: (1) if we want to do something such as
sightseeing or dining in a large city like Beijing, where should we go? (2) If we want
to visit a place such as the Bird’s Nest in Beijing Olympic park, what can we do there?
We develop a mobile recommendation system to answer these queries. In our system,
we first model the users’ location and activity histories as a user–location–activity rating
tensor.1 Because each user has limited data, the resulting rating tensor is essentially
very sparse. This makes our recommendation task difficult. In order to address this data
sparsity problem, we propose three algorithms2 based on collaborative filtering. The first
algorithm merges all the users’ data together, and uses a collective matrix factorization
model to provide general recommendation (Zheng et al., 2010 [3]). The second algorithm
treats each user differently and uses a collective tensor and matrix factorization model to
provide personalized recommendation (Zheng et al., 2010 [4]). The third algorithm is a
new algorithm which further improves our previous two algorithms by using a rankingbased collective tensor and matrix factorization model. Instead of trying to predict the
missing entry values as accurately as possible, it focuses on directly optimizing the ranking
loss w.r.t. user preferences on the locations and activities. Therefore, it is more consistent
with our ultimate goal of ranking locations/activities for recommendations. For these three
algorithms, we also exploit some additional information, such as user–user similarities,
location features, activity–activity correlations and user–location preferences, to help the
CF tasks. We extensively evaluate our algorithms using a real-world GPS dataset collected
by 119 users over 2.5 years. We show that all our three algorithms can consistently
outperform the competing baselines, and our newly proposed third algorithm can also
outperform our other two previous algorithms.
© 2012 Elsevier B.V. All rights reserved.
1. Introduction
As mobile devices with positioning functions, such as GPS-phones, become more and more popular, people are now able
to find locations more easily. Based on these location data, various location-based services (LBS) are provided on the Web
*
Corresponding author.
E-mail addresses: vincentz@cse.ust.hk (V.W. Zheng), yuzheng@microsoft.com (Y. Zheng), xing.xie@microsoft.com (X. Xie), qyang@cse.ust.hk (Q. Yang).
1
A “tensor” is a multi-dimensional array (Symeonidis et al., 2008 [1]; Cichocki et al., 2009 [2]).
2
This work is an extension to our previous work (Zheng et al., 2010 [3,4]). We propose a new model in Section 5.3 and completely re-conduct the
experiments for all our three algorithms.
0004-3702/$ – see front matter
doi:10.1016/j.artint.2012.02.002
© 2012
Elsevier B.V. All rights reserved.
18
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
Fig. 1. GPS data management services. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)
and shown to be quite attractive to users [5–7]. People can now share on the Web not only their raw GPS coordinates and
time stamps, for example, for cycling route exchange,3 but also rich text such as comments and pictures related to their
trip trajectories for social blogging. In Fig. 1, we show such a location data management service from GeoLife [8], which
allows users to share annotated GPS trajectories on the Web. Consider one example from this figure: after traveling around
the Forbidden City in Beijing, a user tries to share some travel experiences on the Web. He then uploads his GPS trajectory
of this trip, and also annotates it by attaching some interesting comments4 (depicted as small pink boxes, each unfolding
as a text box) about what he was doing, what he saw or how he felt about the places, and other useful information.
Hopefully, such comments bring rich semantics to GPS trajectories and make it easier for mobile users to share their travel
experiences. We expect to take such partially annotated GPS location data from many mobile users as input, and extract
useful knowledge about the locations and user activities. For example, which locations are popular, and what activities are
suitable at some places? Our goal is to utilize crowd wisdom encoded in their location histories to provide useful mobile
recommendations. In particular, we are interested in collaborative location and activity recommendations, which are able to
give both location recommendations with some activity query and activity recommendation with some location query. Here,
“activity” can refer to various human behaviors such as dining, shopping, watching movies/shows, enjoying sports/exercises,
tourism, and the like.
To accomplish this collaborative recommendation task, we extract location and activity information from the GPS history
data for each user and formulate the recommendation problem as a collaborative filtering problem on the user–location–
activity data input. We propose three collaborative filtering (CF) algorithms that rely on collective tensor and/or matrix
factorization to address the data sparsity problem in recommendation:
1. A collaborative location and activity filtering (CLAF) algorithm [3], which merges all the users’ data together, and uses a
collective matrix factorization model to provide general recommendations.
2. A personalized collaborative location and activity filtering (PCLAF) algorithm [4], which treats each user differently and uses
a collective tensor and matrix factorization to provide personalized recommendations.
3. A ranking-based personalized collaborative location and activity filtering (RPCLAF) algorithm, which formulates each users’
pairwise preferences on the locations/activities and uses a ranking-based collective tensor and matrix factorization
model to provide personalized recommendations.
We extract some auxiliary information to help the CF tasks. Such information includes the location features from the
POI (points of interest) database, the activity–activity correlations from the Web, the user–user similarities from the user
demographics database and the user–location preferences from the GPS trajectory data. We show that our algorithms can
naturally transfer knowledge from this auxiliary information to help prediction in the target domain where location–activity
rating data are sparse for the users. Among our three algorithms, the first two (i.e. CLAF and PCLAF) use square loss as
3
4
http://www.bikely.com/.
We consider using picture information as our future work.
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
19
Fig. 2. Illustration of location and activity recommendation.
Fig. 3. Missing values in the user–location–activity tensor A.
the optimization criteria. Specifically, their models aim to generate user–location–activity rating predictions that are as
similar as the ground truth values. Then, they use their predictions to rank the locations/activities for recommendation.
Different from these two algorithms, our third algorithm RPCLAF uses ranking loss for optimization and tries to model
the user pairwise preferences on the locations/activities. The reason behind is because using ranking loss is essentially
more consistent with the ultimate goal of ranking locations/activities for recommendation. Besides, it can also benefit from
modeling more information with the same amount of parameters as the PCLAF algorithm. We use stochastic gradient descent
to solve the optimization problems in our algorithms, and fill all the missing values of the user–location–activity tensor with
reasonable predictions. Based on ranking over the filled location–activity matrix for each user (i.e. some slice of the user–
location–activity tensor), we can provide both location recommendations and activity recommendations. Finally, we evaluate
our system using a real-world GPS dataset, which was collected by 119 users over 2.5 years. The number of GPS points is
around 4 million and a total distance of over 139,310 kilometers.
2. Problem statement
From the GPS data, we can extract three entities, i.e. users, locations and activities, denoting that some user visited
some place and did something there. We propose to model such user–location–activity relations in a 3-D tensor, with each
dimension corresponding to an entity above. In particular, we denote such a tensor as A ∈ Rm×n×r , where m is the number
of users, n is the number of locations and r is the number of activities. Then an entry ai jk in A denotes the frequency of
a user i visiting location j and doing activity k there. When this tensor A is full, for each user, we can easily extract her
location–activity matrix as a slice of the tensor and based on the ratings in it to do recommendations. As shown in Fig. 2,
we can see location recommendation for some given activity query as a ranking over the row entry values in some column,
and activity recommendation as a ranking over the column entry values in some row.
As we can see above, the recommendation is based on rankings over the complete location–activity matrix. However, in
practice, the location–activity matrix for each user can be sparse due to limited number of annotations. Therefore, we may
expect to have many missing entries in each user’s location–activity matrix as shown in Fig. 3. Our job is to build some
model which can predict a reasonable ranking on these missing entries based on what we have known with the existing
entries in the tensor A.
There are several ways to accomplish our job. A general idea is to first predict the values of such missing entries, and
then based on the predicted values to give rankings for location and activity recommendation. As each user has limited
location–activity ratings, it is natural to consider merging all the users’ ratings together in order to get a denser location–
activity matrix. Collaborative filtering can then be used to fill the existing missing entries in the matrix. Our first algorithm
CLAF is based on such an intuition. It also exploits the location features and activity correlations as auxiliary information
(as discussed later) to further alleviate the data sparsity problem. It relies on a collective matrix factorization model to
fulfill the goal of collaborative filtering with the auxiliary information sources. Our CLAF algorithm is shown to work well in
practice [3], but it is limited to provide only general recommendations. In order to provide personalized location and activity
recommendations, we propose the second algorithm PCLAF, which directly models the users and employs a user–location–
activity tensor for CF [4]. It also uses the auxiliary location and activity information, together with the user similarities and
user–location preferences. Finally, it relies on a collective tensor and matrix factorization model to solve the CF problem.
Our third algorithm tries to solve the CF problem from a different perspective. Considering that recommendation task is
20
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
Fig. 4. GPS trajectory and stay point.
essentially a ranking problem, the previous two algorithms’ trying to first predict the missing values and later rank them for
recommendation can be taking an indirect route to solve the problem. Therefore, we can develop a model that directly uses
ranking loss as the objective function, and only focuses on finding the users’ pairwise preferences among locations/activities.
Our newly proposed RPCLAF algorithm offers such a model. It uses ranking-based collective tensor and matrix factorization
to incorporate the auxiliary information, and is shown to be better in ranking performance since its objective function is
more consistent with ranking.
In general, there are two main categories of CF techniques. One is memory-based, using the rating data to measure the
similarity between the interested matrix entities [9]. Then, these similarity values are employed to produce a prediction with
a weighted average of the existing ratings. The other is model-based, relying on matrix factorization to uncover latent factors
that explain observed ratings. Then, the latent factors are used to reconstruct the incomplete matrix and thus produce the
rating predictions [10]. All of our three CF algorithms are model-based.
3. Overview of our system
In this section, we first clarify some terms used in this paper. Then, we discuss the application scenarios and the architecture of our system.
3.1. Preliminary
First, we clarify some terms, including GPS trajectory (Traj), stay point (s) and stay region (r).
Definition 1 (GPS trajectory). A user’s trajectory Traj is a sequence of time-stamped points: Traj = p 0 , p 1 , . . . , pk , where a
GPS point p i = (xi , y i , t i ), ∀0 i < k, with t i as a timestamp (t i < t i +1 ), and (xi , y i ) as the two-dimension coordinates [11].
In the right part of Fig. 4, we show a trajectory consisted of 7 GPS points.
Definition 2 (Stay point). A stay point s stands for a geographical region where a user stayed over a time threshold T r within
a distance threshold of D r . Denote Dist( p i , p j ) as the geospatial distance between two points p i and p j , and Int( p i , p j ) =
| p i .t i − p j .t j | as their time interval. In a user’s trajectory, s can be seen as a virtual location characterized by a set of
consecutive GPS points P = pm , pm+1 , . . . , pn , where ∀m < i n, Dist( pm , p i ) D r , Dist( pm , pn+1 ) > D r and Int( pm , pn ) T r . Hence, a stay point s = (x, y , ta , tl ), where
s.x =
n
p i .x/| P |,
i =m
s. y =
n
p i . y /| P |
(1)
i =m
respectively stands for the average x and y coordinates of the collection P ; s.ta = pm .tm is the user’s arriving time on s and
s.tl = pn .tn represents the user’s leaving time [11].
Compared with raw GPS points, stay points are more meaningful in representing the locations a user stays by capturing
the time duration and vicinity information, and they are commonly used as the basic units in representing the GPS data
[11,12]. However, in practice, when we consider many GPS trajectories together, we may find that some stay points refer
to the same interested region. This is because the users can stay in different parts (e.g. the west and east wings) of an
interested region (e.g. Bird’s Nest stadium). In the recommendation, we focus on a whole region of interest such as the
Bird’s Nest rather than its two wings, so we need to further extract some geographical regions by clustering the nearby stay
points. We call these stay regions.
Definition 3 (Stay region (location)). Given all the stay points extracted from the GPS data as S = {s1 , s2 , . . . , s N } and a
clustering algorithm Alg( S ) taking S as input, we have a stay region r as a geographic region which contains a set of stay
, s
points S = {sm
m+1 , . . . , sn | s i ∈ S, ∀m i n} belonging to some same cluster. Hence, a stay region r = (x, y ), where
sr .x =
n
i =m
s i . x / S ,
sr . y =
n
i =m
s i . y / S (2)
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
21
Fig. 5. User interface for our system.
stand for the average x and y coordinates of the collection S. In this work, stay regions are used as the basic units for
location recommendation, i.e. when we recommend locations, in fact we recommend stay regions.
We instantiate Alg as a grid-based clustering algorithm in [3]. The basic idea is to divide the map into grids, and employ
a greedy algorithm to iteratively assign the grid with maximal number of stay points and its neighboring grids into the same
cluster. Notice that we do not directly extract stay regions by clustering the raw GPS points from all the trajectories. This is
because we may lose sequential information by mixing the raw GPS points from different trajectories together, and thus it is
hard to detect any meaningful stays. Interested readers may refer to our previous work [3] for more details. Compared with
previous clustering algorithms such as the classic k-means algorithm and the density-based OPTICS clustering algorithm [13]
that do not constrain the output cluster sizes, our grid-based clustering algorithm can make sure that the recommended
locations are not be too large in size for users to find the destinations. However, we do not argue that the stay regions
found by our grid-based clustering algorithm are definitely better than those found by some other clustering algorithms in
terms of some other metrics like cluster coherence or density awareness.
3.2. Application scenarios and architecture
The work reported in this paper is an important component of our GeoLife project, whose prototype has been internally
accessible within Microsoft since Oct. 2007. So far, we have had 119 individuals using this system.
Fig. 5 shows our system’s user interface. It’s organized as a Website (similar to a search engine) so that both PCs and
hand-held devices can access it. To use our system, a user can choose to log in the system to get personalized recommendations or stay non-login to get general recommendations. Then, for activity recommendation, the user can input a location,
such as “Bird’s Nest”, as a location query; then, our system can show the queried location on the map and suggest a ranking
list of activities (top five here). The user can provide some feedback about the results by giving some ratings. For location
recommendation, the user can input an activity, such as “tourism and amusement”, as an activity query; then our system
can suggest a ranking list of candidate locations (top ten here) and display them on the map, so that the user can zoom in
on the map and get more details (e.g. transportation). The user can also view the location candidates ranked lower than ten
to get more recommendations. Similarly, the user can also provide feedbacks on location recommendation.
For system architecture, in the back-end, our recommendation system consists several parts. First, it takes raw GPS data
as input and processes them to get the meaningful stay regions as interested locations for recommendation. Second, it takes
the user comments as input to extract the useful activity information for each interested location. Third, it extracts the
auxiliary knowledge such as user similarities, location features and activity correlations. Fourth, it trains a recommender
based on some collaborative filtering algorithm we provided. In the front-end, our system provides some interface so that
the users can access the recommender through internet using laptops/PCs or PDAs/smart-phones, and submit the query (i.e.
22
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
Fig. 6. An example about activity information extraction.
activity or location names). Then, our system returns a ranking list of locations or activities given the activity or location
query.
4. Data modeling
In this section, we introduce how to model the location–activity data for training the recommender. We also introduce how to extract auxiliary information such as location–features, activity correlations, user similarities and user–location
preferences for additional inputs.
4.1. Activity information extraction
We rely on the user-generated text comments to get the user–location–activity tensor. Based on stay region extraction,
we can get a set of stay regions. For each stay region sr i in the stay region set R = {sr i , 1 i | R |}, we can first extract the
comments from the GPS data attached to this stay region. Consider an example shown in Fig. 6. A user visited the Forbidden
City, where he attached some comment on the GPS trajectory on the map, saying that “We took a tour bus to look around along
the forbidden city moat . . . ”. From the GPS coordinates, we can figure out the stay region as “Forbidden City”. Then, from the
comment content, we can infer that the user was pursuing “Tourism”. One such comment gives a rating of “1” to the
location–activity pair of “Forbidden City”, “Tourism”. By parsing all the comments, we can count various activities on each
(i )
(i ) (i )
(i )
(i )
stay region (location) for each user. Let us denote an r-dimensional count vector a j = [a j1 , a j2 , . . . , a jr ], where each a jk is
the number of times when activity k was performed at a location j by user i. Therefore, the user–location–activity tensor
A has its entries defined as:
(i )
Ai jk = a jk ,
∀ i = 1, . . . , m ; j = 1, . . . , n ; k = 1, . . . , r .
(3)
For an entry of Ai jk = 0, it means that we do not observe any comment from the data indicating that user i performed
activity k at location j. We treat these zero entries as missing values, in the sense that the user may still be interested in
doing that activity at that location though we have not observed any indication so far.
Note that, in this study, we use human labelers to parse the user-generated comments to get the activity labels. But in
general, as the user comments are basically text, one can use text classification to automatically detect the activities. For
example, Nigam et al. provide an approach to use both labeled and unlabeled text data for classification, [14]. Therefore, the
human labeling cost can be greatly reduced and the activity extraction becomes more scalable. We leave this as our future
work.
4.2. Location-feature extraction
We use the POI category database to get the statistics (counts) of different POIs in an interested region. In particular,
given a stay region sr i ∈ R, we count the number of different POIs in an enclosing rectangle of the stay points in sr i , with
the coordinates as [sr i .lat − d s /2, sr i .lat + d s /2] × [sr i .lng − d s /2, sr i .lng + d s /2]. Here, d s is the size parameter and it is
set as 500 meters in this paper. Interested readers are referred to our previous work [3] for more experimental details
on this parameter. Therefore, the size of the enclosing rectangle is d × d. Denote the count vector for a location j as
c j = [c j1 , c j2 , . . . , c jp ] for p types of POIs. Consider that some types of POIs (e.g. restaurants) are more popular than others
(e.g. movie theaters), we follow information retrieval to further normalize these counts in the form of term-frequency
inversed-document-frequency (TF-IDF) [15] to obtain a location–feature matrix C ∈ Rn× p :
c jl
C jl = p
c
l=1 jl
· log
|{c j }|
,
|{c j : c jl > 0}|
∀ j = 1, . . . , n ; l = 1, . . . , p ,
(4)
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
23
where |{c j }| is the number of all the count vectors (i.e. number of locations), and |{c j : c jl > 0}| is the number of count
vectors (i.e. locations) having non-zero l-th type POIs. In this way, we increase the weights for those important POIs that
are fewer but unique (e.g. movie theaters), and decrease the weights for those extensively distributed POIs (e.g. restaurants).
4.3. Activity–activity correlation extraction
Knowing the correlation between activities can help us to better infer what the users may do in some location based on
the observation of the activities performed before. One possible way to get such correlation information is to calculate it
directly from the GPS data; but due to the limited number of comments, we may not get reliable results. Fortunately, such
activity correlations are usually common sense and possibly reflected on the World Wide Web. To facilitate such common
sense mining, we turn to Web search for help [16]. In particular, for each pair of activities, we put their names together as
a query and submit it to some commercial search engine to get the Webpage hit counts. For example, given activities “food
and drink” and “shopping”, we generate a query “food and drink, and shopping” and send it to Bing. Bing then returns a list
of Webpages that describe these two activities together, and as expected, the number of such returned Webpages implies
the correlation between them. In general, we find the hit count for “food and drink, shopping” (48.5 million5 ) is higher than
that for “food and drink, and sports and exercises” (39.4 million), showing that the correlations of “food and drink” with
“shopping” is higher than with “sports and exercise”, coinciding with common sense. Based on such a method, we then
have an activity–activity matrix D ∈ Rr ×r , with each entry defined as
D i j = h i j /h∗ ,
∀ i = 1, . . . , r ; j = 1, . . . , r ,
(5)
where h i j is the hit count for activity i and activity j based on some search engine. In this paper, we employ a simple
normalization strategy by dividing each hit count value with h∗ , where h∗ = arg max h i j , ∀i , j is the maximal hit count
among all the hit counts for each pair of activities.
4.4. More information about user
In addition to the activity and location information we have extracted above, we also have the user–user matrix
B ∈ Rm×m which encodes the user–user similarities. In this study, we use the demographic information such as age, gender
and job of each user to form a feature vector; and then, we measures the cosine similarities between each pair of users
based on their demographic feature vectors. There can be some other ways to get such user–user similarities, such as using
online social network services or relying on some questionnaires of each user’s friend network. But we do not exploit them
here and leave them for future study. In general, we aim to use such similarity information to uncover the like-minded users
in CF. Optionally, we can also extract a matrix E ∈ Rm×n from the GPS data to formulate the user–location preferences. This
matrix could be helpful to model the case when we only know a user visited some place but have no idea what she was
doing there.
5. Mobile recommendations
Our goal is to predict a reasonable ranking on the missing entries of user–location–activity tensor A. In addition to the
existing entries in the tensor, we also have some additional inputs such as location features, activity correlations and user–
user similarities that can help prediction. In the following, we propose three collaborative filtering algorithms to achieve our
goal.
5.1. Collaborative location and activity filtering
As each user has limited location–activity ratings, one possible solution is to merge all the users’ ratings together in
order to get a denser location–activity matrix. In particular, we can consider to compress the 3-D user–location–activity
tensorinto a 2-D location–activity matrix. As shown in Fig. 7, we obtain a location–activity matrix A ∈ Rn×r by having
m
A i j = k=1 Aki j , ∀i = 1, . . . , n; j = 1, . . . , r. Such a matrix aggregates the ratings of all the users. Therefore, from the matrix,
we can know what people usually do when they visit some place. We can use this knowledge to guide our recommendation.
Though the matrix A is already denser than the tensor A, it still has many missing entries. Consequently, our job
becomes filling the missing entries in A. For a missing entry A i j , we use collaborative filtering to predict its value. In
general, if we know that one location i is suitable for doing some activity j (such as “Shopping”), and another location i is
similar to location i, we may infer that location i is also suitable for doing activity j. Such an intuition can be captured by
decomposing
A i j = xi · y j ,
5
All the hit count values shown here are based on the search results on May 23, 2010.
24
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
Fig. 7. Compress the user–location–activity tensor to a location–activity matrix.
Fig. 8. Model demonstration for our CLAF algorithm.
where xi ∈ Rn×d is the latent factor for location i and y j ∈ Rr ×d is the latent factor for activity j. Here, d is the latent factor
dimension. The location latent factors characterize the properties of locations, and thus for similar locations i and i , their
latent factors xi and xi are similar. As a result, if location i is suitable for activity j (i.e. A i j is big), then we can predict
that location i is also suitable for activity j (i.e. A i j is also big), given the same activity latent factor y j . We can find these
latent factors xi and y j for each location i and activity j, based on the existing entries in the matrix A. Specifically, we may
try to minimize the following objective function in order to get xi and y j :
L( X , Y ) =
(xi · y j − A i j )2 ,
(6)
(i , j )∈D A
where the loss term is computed on the existing entry set D A for matrix A. In addition, we can also use location features
and activity correlations to help the this optimization. For example, using the location features, we can have the prior
knowledge of whether a location is similar to another location based on their feature values. Using the activity–activity
correlations, we can know how likely the occurrence of one activity may imply the occurrence of another activity. One
example is that, many people choose to have food and drink in the shopping mall as usually there are many restaurants and
bars in/near the shopping mall. Therefore, if we observe a location is suitable for activity “shopping”, it can also suitable
for activity “food and drink”. This information gives us some prior knowledge about the activity latent factors, and thus can
help the matrix factorization of A. As shown in Fig. 8, we then aim to factorize the target location–activity matrix A, the
additional location–feature matrix C and the activity–activity correlation matrix together.
Formally, we propose to employ collective matrix factorization [17] for developing a collaborative location and activity
filtering model, and the objective function is:
L( X , Y , Z ) =
(xi · y j − A i j )2 + β1
(i , j )∈D A
+ β2
(xi · zk − C ik )2
(i ,k)∈DC
D jk y j − yk 2 + β3 X 2F + Y 2F + Z 2F ,
(7)
( j ,k)∈D D
where · F denotes the Frobenius norm. βi 0, ∀i are parameters to manually tune. In the above objective function,
we aim to propagate the information among the matrices A, C and D, by requiring them to share some latent factors
X ∈ Rn×d , Y ∈ Rr ×d and Z ∈ R p ×d . The first two terms in the objective function measure the loss in matrix factorization
on A and C . The third term forces the learned latent activity factors yi and y j to be more similar if activity i and activity
j have higher correlation (i.e. D i j is bigger). The last term controls the regularization over the factorized matrices so as to
prevent overfitting.
In general, this objective function is not jointly convex to all the variables X , Y and Z , and we cannot get closed-form
solutions for minimizing the objective function. Therefore, we turn to some numerical method such as stochastic gradient
descent to get the local optimal solutions. Specifically, we obtain the gradients for each variable in Table 1.
Finally, we use gradient descent to iteratively minimize the objective function, and the details are given in Algorithm 1. In
each iteration, the algorithm first randomly samples one existing entry (i , j ) in the matrix A by an operation bootstrap.
Then, it updates the latent factor variables xi and y j . It also updates all the latent factor variables zk ’s. After having the
converged X and Y , we can predict the missing values in matrix A. Based on the predictions, we can provide both location
and activity recommendations. Note that, this algorithm is focused on general recommendation, so that the system gives
same recommendation results to different users.
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
25
Table 1
Gradients for Eq. (7). Without loss of generality, we ignore the
constant value of 1/2 throughout the whole paper’s gradient
derivation.
∂L = j (xi · y j − A i j )y j + β1 k (xi · zk − C ik )zk + β3 xi ,
∂ xi
∂L = i (xi · y j − A i j )xi + β2 k= j D jk (y j − yk ) + β3 y j ,
∂y j
∂L
= β1 i (xi · zk − C ik )xi + β3 zk .
∂ zk
Algorithm 1 The CLAF algorithm
1: Randomly initialize the parameters X , Y and Z ;
2: repeat
3:
for t = 1 to |D A | do
(i , j ) ← bootstrap(D A );
// random sampling with replacement
4:
5:
6:
7:
∂L
;
∂ xi
∂L
Update y j ← y j − γ
;
∂y j
Update xi ← xi − γ
Update zk ← zk − γ
8:
end for
9: until convergence
∂L
, ∀k;
∂ zk
// according to Table 1
// according to Table 1
// according to Table 1
Fig. 9. Model demonstration for our PCLAF algorithm.
5.2. Personalized collaborative location and activity filtering
One limitation of our CLAF algorithm is that it cannot provide personalization to each user in recommendation. Therefore,
we propose a PCLAF algorithm to address this problem [4]. Specifically, we directly model the user–location–activity tensor
A under the factorization framework, and try to use as much additional information as possible to help alleviate the data
sparsity issue. The model illustration for our PCLAF algorithm is given in Fig. 9. Our goal here is to fill the missing entries
in tensor A. In addition to the location features and activity correlations that we have used in our CLAF algorithm, we
introduce more information for the users since we now directly model each user in collaborative filtering. In particular, we
utilize the matrix B ∈ Rm×m which encodes the user–user similarities. We aim to use this similarity information to uncover
the like-minded users in CF. We also have another matrix E ∈ Rm×n from the GPS data to model the user–location visiting
preferences. It can be useful to formulate the user preferences on each location. Note that, there have been some studies on
exploiting collective matrix factorization [17], or modeling the multi-dimensional (tensor) data with memory-based CF [18],
or single tensor factorization [1,2], but few of them consider handling collective tensor and matrix factorization together.
To fill missing entries in the tensor A, we follow the model-based methods [10,17] to decompose the tensor A w.r.t.
each tensor entity (i.e. users, locations and activities). In factorization, we force the latent factors to be shared with the
additional matrices so as to utilize their information. After such latent factors are obtained, we can reconstruct the tensor
by filling all the missing entries. In our model, we propose a PARAFAC-style tensor factorization [2] framework to integrate
the tensor with the additional matrices for regularized factorization. Specifically, our objective function is
26
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
Table 2
d
Gradients for Eq. (8), where A i jk = l=1 xil y jl zkl and “◦” is entry-wise product.
∂L
∂ xi
∂L
∂y j
∂L
∂ zk
∂L
∂ vl
=
=
=
= λ2
− Ai jk )(y j ◦ zk ) + λ1
− Ai jk )(xi ◦ zk ) + λ2
− Ai jk )(xi ◦ y j ) + λ3
j ,k (A i jk
i ,k (A i jk
i , j (A i jk
j (y j
j =i
B i j (xi − x j ) + λ4
l (y j
· vl − C jl )vl + λ4
l=k
j (xi
· y j − E i j )y j + λ5 xi
i (xi
· y j − E i j )xi + λ5 y j
D kl (zk − zl ) + λ5 zk
· vl − C jl )y j + λ5 vl
Algorithm 2 The PCLAF algorithm
1: Randomly initialize the parameters X , Y , Z and V ;
2: repeat
3:
for t = 1 to |DA | do
(i , j , k) ← bootstrap(DA ); // random sampling with replacement
4:
5:
6:
7:
8:
∂L
;
∂ xi
∂L
Update y j ← y j − γ
;
∂y j
Update xi ← xi − γ
// according to Table 2
// according to Table 2
∂L
;
∂ zk
∂L
Update vl ← vl − γ
, ∀l;
∂ vl
Update zk ← zk − γ
// according to Table 2
// according to Table 2
9:
end for
10: until convergence
L( X , Y , Z , V ) =
(i , j ,k)∈DA
+ λ3
d
2
xil y jl zkl − Ai jk
l =1
(k,l)∈D D
2
D kl zk − zl + λ4
+ λ1
B i j xi − x j 2 + λ2
(i , j )∈D B
( j ,l)∈DC
(xi · y j − E i j ) + λ5 X + Y 2 + Z 2 + V 2 ,
2
(y j · vl − C jl )2
2
(8)
(i , j )∈D E
where X ∈ Rm×d , Y ∈ Rn×d , Z ∈ Rr ×d and V ∈ R p ×d are the matrix forms of latent factors for user, location, activity and
location features, respectively. λ1 –λ5 are model parameters; and when λ1 = λ2 = λ3 = λ4 = 0, our model degenerates to
the standard PARAFAC tensor decomposition. This shows that our model is more flexible to utilize other information about
the targeted entities. In the above objective function, the first term decomposes the user–location–activity tensor A as an
outer-product of three latent factors w.r.t. each entity. The second term poses a regularization term on the users, forcing the
latent factors of two users to be as close as possible if they are similar according to matrix B. The third term borrows the
similar idea with collective matrix factorization [17], by sharing the location latent factor Y with the tensor factorization.
The fourth term is a regularization term similar to the second term, forcing the latent factors of two activities to be as
close as possible w.r.t. their correlations. The fifth term shares the user latent factor X and location latent factor Y with the
tensor factorization. The last term is a regularization term in order to prevent overfitting.
In general, there is no closed form solution for Eq. (8), so we again use stochastic gradient descent to solve the problem.
The gradients are listed in Table 2, and the algorithm details are given in Algorithm 2. After having the converged X , Y
and Z , we can predict the missing values in tensor A.
5.3. Ranking-based personalized collaborative location and activity filtering
Recall that our job is to build some model which can predict a reasonable ranking on these missing entries in tensor A.
Both our previous algorithms, CLAF and PCLAF aim to find some model that can minimize the prediction errors (e.g. in
terms of square loss) w.r.t. the existing ground truth ratings. After the model is learned, predictions on the missing values
are used for ranking in order to output recommendation results. Considering that in recommendation we are essentially
interested in ranking results, such a learning strategy may take an indirect route to solve the problem. In this section, we
propose a new algorithm, which takes a direct way to solve the recommendation problem by using ranking loss as the
objective function. In particular, our new algorithm, RPCLAF, tries to formulate the user’s pairwise preferences to different
location–activity pairs. By learning with such partial rankings, our model is able to directly deliver the ranking results on
missing entries.
Compared with our previous two algorithms CLAF and PLCAF, our new RPCLAF algorithm has several advantages. First,
the objective function is based on ranking loss which is more consistent with the final goal, so the model may generate
better results. Second, compared with PLCAF which considers each rating independently, RPCLAF takes rating pairs as input
and thus has more data for training. Given that RPCLAF has the same number of latent factor variables as PCLAF, using
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
27
more data can help improve the performance. Third, using ranking-loss is potentially useful for handling the different rating
scales among the tensor and the matrices, as it on focuses on the pairwise ranking rather than the absolute values [19].
Note that, there have been some studies using ranking loss for collaborative filtering [20,21], but few of them consider it in
the collective tensor and matrix factorization scenario.
First, let us define the location–activity pairwise preference as
ηi, j,k, j ,k
⎧
+1 if Ai , j ,k > Ai , j ,k | (i , j , k) ∈ I i ∧ (i , j , k ) ∈ I i ;
⎪
⎪
⎪
⎨0
if Ai , j ,k = Ai , j ,k | (i , j , k) ∈ I i ∧ (i , j , k ) ∈ I i ;
=
⎪
⎪ −1 if Ai , j ,k < Ai , j ,k | (i , j , k) ∈ I i ∧ (i , j , k ) ∈ I i ;
⎪
⎩
?
if (i , j , k) ∈
/ I i ∨ (i , j , k ) ∈
/ Ii;
where I i denotes the existing entries for user i in tensor A. Then, in order to formulate the probability for these pairwise
preferences, we follow the Bradley–Terry model [22,19] by defining
p (ηi , j ,k, j ,k = +1) = σ (Ai , j ,k − Ai , j ,k ),
p (ηi , j ,k, j ,k = −1) = σ (Ai , j ,k − Ai , j ,k ),
p (ηi , j ,k, j ,k = 0) = 1 − σ (Ai , j ,k − Ai , j ,k ) − σ (Ai , j ,k − Ai , j ,k ),
where σ (x) = 1+θ1e−x is the logistic sigmoid function. The positive parameter θ 1 controls the probability of ties.
Given the Bradley–Terry model, one can easily formulate the data loglikelihood. Specifically, denote D+1 = {(i , j , k, j , k ) |
ηi, j,k, j ,k = +1} as the set of data with positive preference, and D0 = {(i , j , k, j , k ) | ηi, j,k, j ,k = 0} as the set of data with
preference ties. Therefore, we construct a pairwise preference data set DA = D+1 ∪ D0 . The loss function is the negative
loglikelihood:
Ltensor = −
ln p (ηi , j ,k, j ,k = +1) −
(i , j ,k, j ,k )∈D+1
ln p (ηi , j ,k, j ,k = 0),
(9)
(i , j ,k, j ,k )∈D0
d
where the tensor rating is factorized in a PARAFAC manner as our PCLAF algorithm: Ai , j ,k l=1 xil y jl zkl .
Similar to our previous PCLAF algorithm, we also utilize the user–location matrix to model user preferences on locations.
In particular, we define the pairwise preference as:
⎧
+1 if E i , j > E i , j | (i , j ) ∈ J i ∧ (i , j ) ∈ J i ∧ (i , j , ·, j , ·) ∈ DA ;
⎪
⎪
⎨
0
if E i , j = E i , j | (i , j ) ∈ J i ∧ (i , j ) ∈ J i ∧ (i , j , ·, j , ·) ∈ DA ;
ζu , j , j =
⎪
⎪
⎩ −1 if E i , j < E i , j | (i , j ) ∈ J i ∧ (i , j ) ∈ J i ∧ (i , j , ·, j , ·) ∈ DA ;
?
otherwise.
= {(i , j , j ) | ζ
J i is the set of existing entries for user i in matrix E. Denote D+
i , j , j = +1} as the set of data with positive
1
preference, D−1 = {(i , j , j ) | ζi , j , j = −1} as the set of data with negative preference and D0 = {(i , j , j ) | ζi , j , j = 0} as the
set of data with tied preference. Therefore, the negative loglikelihood is
Lpref = −λ4
ln p (ζi , j , j = +1) +
(i , j , j )∈D+
1
ln p (ζi , j , j = −1) +
(i , j , j )∈D−
1
ln p (ζi , j , j = 0) ,
(10)
(i , j , j )∈D0
where the matrix rating is factorized as E i , j xi · y j .
For the other auxiliary information such as user similarities, location features and activity correlations, we define the
loss function as
Laux = λ1
B il xi − xl 2 + λ2
(i ,l)∈D B
(y j · vl − C jl )2 + λ3
( j ,l)∈DC
D kl zk − zl 2 ,
(11)
(k,l)∈D D
and the regularization term is
R = λ5 X 2 + Y 2 + Z 2 + V 2 ,
(12)
where all the λ’s are positive real numbers.
Finally, we aim to minimize the following objective function
L( X , Y , Z , V ) = Ltensor + Laux + Lpref + R.
(13)
We use stochastic gradient descent to solve this minimization problem, and calculate the gradients for each parameter as
shown in Tables 3, 4 and 5. The algorithm details are given in Algorithm 3. In each iteration, the algorithm first randomly
samples one user i and her two existing rating entries (i , j , k), (i , j , k ) entries in the tensor A by an operation bootstrap.
28
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
Table 3
Gradients for Eq. (9).
∂ Ltensor
=
∂ xi
∂ Ltensor
=
∂y j
∂ Ltensor
=
∂ y j
∂ Ltensor
=
∂ zk
∂ Ltensor
=
∂ zk
(σi jkj k − 1)(y j ◦ zk − y j ◦ zk )
if (i , j , k, j , k ) ∈ D+1 ;
(σi jkj k − σi j k jk )(y j ◦ zk − y j ◦ zk ) if (i , j , k, j , k ) ∈ D0 .
(σi jkj k − 1)(xi ◦ zk )
if (i , j , k, j , k ) ∈ D+1 ;
(σi jkj k − σi j k jk )(xi ◦ zk ) if (i , j , k, j , k ) ∈ D0 .
(σi jkj k − 1)(−xi ◦ zk )
if (i , j , k, j , k ) ∈ D+1 ;
(σi jkj k − σi j k jk )(−xi ◦ zk ) if (i , j , k, j , k ) ∈ D0 .
(σi jkj k − 1)(xi ◦ y j )
if (i , j , k, j , k ) ∈ D+1 ;
(σi jkj k − σi j k jk )(xi ◦ y j ) if (i , j , k, j , k ) ∈ D0 .
(σi jkj k − 1)(−xi ◦ y j )
if (i , j , k, j , k ) ∈ D+1 ;
(σi jkj k − σi j k jk )(−xi ◦ y j ) if (i , j , k, j , k ) ∈ D0 .
Table 4
Gradients for Eq. (10).
⎧
;
λ (σ − 1)(y j − y j )
if (i , j , j ) ∈ D+
⎪
1
⎨ 4 ijj
∂ Lpref
= λ4 (σi j j − σi j j )(y j − y j ) if (i , j , j ) ∈ D0 ;
⎪
∂ xi
⎩
;
λ4 (1 − σi j j )(y j − y j )
if (i , j , j ) ∈ D−
1
⎧
) ∈ D ;
λ
(
σ
−
1
)
x
if
(
i
,
j
,
j
4
i
ijj
⎪
+1
⎨
∂ Lpref
= λ4 (σi j j − σi j j )xi if (i , j , j ) ∈ D0 ;
⎪
∂y j
⎩
;
λ4 (1 − σi j j )xi
if (i , j , j ) ∈ D−
1
⎧
;
λ4 (σi j j − 1)(−xi )
if (i , j , j ) ∈ D+
⎪
1
⎨
∂ Lpref
= λ4 (σi j j − σi j j )(−xi ) if (i , j , j ) ∈ D0 ;
⎪
∂ y j
⎩
.
λ4 (1 − σi j j )(−xi )
if (i , j , j ) ∈ D−
1
Table 5
Gradients for Eqs. (11) and (12).
∂ Laux
∂ xi
∂ Laux
∂y j
∂ Laux
∂ y j
∂ Laux
∂ zk
∂ Laux
∂ zk
∂ Laux
∂ vl
= λ1
= λ2
= λ2
= λ3
= λ3
= λ2
l=i
B i ,l (xi − xl )
l (y j
l (y j l=k
· vl − C j l )vl
D kl (zk − zl ),
l=k
· vl − C jl )vl
j (y j
D k l (zk − zl )
· vl − C jl )y j
∂R
∂ xi
∂R
∂ yl
∂R
∂ yl
∂R
∂ za
∂R
∂ za
∂R
∂ vi
= λ5 xi
= λ5 y j
= λ5 y j = λ5 zk
= λ5 zk
= λ5 vl
Then, based on the partial ranking on Ai jk and Ai j k , it considers different ηi , j ,k, j ,k ’s to derive the gradients on tensor loss.
Given the sampled (i , j , j ), the algorithm considers different ζi , j , j ’s to derive the gradients on user–location preference
loss. The gradients on auxiliary information and regularization terms are further calculated. Then, the algorithm updates
the latent factor variables xi , y j and zk . It also updates all the latent factor variables vl ’s. After having the converged X , Y
and Z , we can predict the missing values in tensor A for ranking. Note that, compared with the previous PCLAF algorithm,
our current RPCLAF can benefit from modeling more data (i.e. entry pairs rather than just each entry) without increasing
the number of model parameters.
6. Experimental setup on real-world data
6.1. GPS users, devices and data
In our experiments, we got data from 119 users who carried GPS devices to record their outdoor trajectories from April
2007 to Oct. 2009. Fig. 10(a) shows the GPS devices used to collect data, which are comprised of stand-alone GPS receivers
and GPS phones. In general, the sampling rate for GPS devices was set as two seconds. The GPS logs were collected in China,
as well as a few cities in the United States, South Korea, and Japan. As most parts of the logs were generated in Beijing, and
for easier evaluation of our system, we extract the logs from Beijing for our experiments. After this data preprocessing, we
obtain a dataset having around 13,000 GPS trajectories with a total of around 4,000,000 GPS points and a total trajectory
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
29
Algorithm 3 The RPCLAF algorithm
1: Randomly initialize the parameters X , Y , Z and V ;
2: repeat
3:
for t = 1 to |DA | do
(i , j , k, j , k ) ← bootstrap(DA);
4:
5:
6:
7:
∂ Lpref
∂ Laux
∂R
∂L
∂ Ltensor
= xi − γ
+
+
+
;
∂ xi
∂
xi
∂
xi
∂
xi
∂
xi
∂L
∂L
∂L
∂L
Similarly, update yl ← y j − γ
, y j ← y j − γ
, zk ← zk − γ
, zk ← zk − γ
;
∂y j
∂ y j
∂ zk
∂ zk
∂L
Update vl ← vl − γ
, ∀l;
∂ vl
Update xi ← xi − γ
8:
end for
9: until convergence
end
Fig. 10. GPS devices and data distribution.
Table 6
Activities that we used in the experiments.
Activities
Descriptions
Food and drink
Shopping
Movie and shows
Sports and exercise
Tourism and amusement
Dinning/drinking at restaurants/bars, etc.
Supermarkets, department stores, etc.
Movie/shows in theaters and exhibition in museums, etc.
Doing exercises at stadiums, parks, etc.
Tourism, amusement park, etc.
length of around 139,000 kilometers. To make sure that we recommend useful locations and activities, we also remove some
GPS points for work and home. The data distribution in Beijing is shown in Fig. 10(b). To protect the users’ privacy, we use
these data anonymously.
In this study, we first extract the stay regions with our grid-based clustering algorithm. These stay regions, or locations,
have a limited size of 500 meters × 500 meters, and at least 12 stay point records. In order to correctly evaluate our
models, we remove all the users/locations/activities without comments. After processing, we have 119 users, 68 locations
and five activities. Specifically, the five activities are defined in Table 6. We gather more user comments, and in this study,
each user has 8.9 comments on average. As a user may have multiple activities in one location at a time, a comment can
bring more than one ratings. After processing, on average, each user has 11.7 ratings (i.e. 11.7 entries with values) for her
location–activity matrix. In our experiments, we (randomly) split some percentage of these known ratings for training and
the other as the hold-out set for testing. We do not use any unknown entry in evaluation.
6.2. Evaluation methodology
We employ an objective evaluation methodology to evaluate our algorithms. Specifically, at each trial, we randomly split
some percentage (e.g. 30%) of the existing tensor entries for training and hold out the other for testing. Then, we employ
two metrics; one is RMSE (root mean square error) to measure the tensor/matrix reconstruction loss on a hold-out test
data. For RMSE, the smaller, the better. The other metric is AUC (area under the ROC curve), to measure the ranking results
based on the reconstructed tensor from training data.6 Following the definition in [21], we design the AUC score for location
ranking (averaged by m users) as
6
The reason why we use AUC instead of nDCG (normalized discounted cumulative gain) is that, given that our data are very sparse, the length of rank
list is usually short (e.g. around 2–3), and thus nDCG values tend to be close for all kinds of algorithms. As opposed to nDCG, AUC is more discriminative
to measure the ranking over all data pairs.
30
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
AUC loc =
ri
m
1 1
m
i =1
ri
k =1
1
|Di ·k |
δ(Âi jk , Âi j k ),
( j , j )∈Di ·k
#(correct orders)
where r i is the number of activities that user i has in test data. Di ·k is the set of location test data pairs for user i on
activity k. The indicator function δ(Âi jk , Âi j k ) = 1 if (Âi jk − Âi j k )(Ai jk − Ai j k ) > 0 or (Âi jk − Âi j k ) ∧ (Ai jk − Ai j k ) = 0.
Here, is a tolerance parameter to measure the ties. We tentatively set = 0.1 and later study its impact. Similarly, we
have the AUC score for activity ranking (averaged by m users) as
AUC act =
ni
m
1 1 m
i =1
ni
j =1
1
|D i j · |
δ(Âi jk , Âi jk ).
(k,k )∈Di j ·
#(correct orders)
For AUC, the larger, the better. Finally, we run the experiments five times to generate the mean values and standard deviations of the results.
6.3. System performances
We compare our three algorithms (CLAF, PCLAF and RPCLAF) with six competing baselines, including user-based CF (UCF),
location-based CF (LCF), activity-based CF (ACF), unifying user–location–activity CF (ULACF), single CF (SCF) and POI count
based ranking (POIC). In this experiment, we set the model parameters λ1 = λ2 = λ4 = λ5 = 0.1, λ3 = 1, β1 = β3 = 0.1,
β2 = 1, k = 4, θ = e 1 . We study the impact of these model parameters later.
Baseline algorithms. The first three baselines (i.e. UCF, LCF and ACF) are memory-based methods, adapted from [23] to
consider CF on each tensor slice. In particular, for UCF, we consider CF on each user–location matrix for each activity
independently. On each matrix, we follow [23] and use Pearson correlation as the user similarity weights. We find the top
N similar users for some target user (with missing entries) and then compute their weighted average to predict the missing
entry. Similarly, we have LCF and ACF by considering CF on each location–activity matrix for each user individually. In the
experiments, we set N = 4 since we find that the prediction results do not vary significantly with N.
The fourth baseline, ULACF, is also a memory-based method, adapted from [9] to take both the tensor and the additional
matrices into consideration. In particular, for each missing entry in the tensor, we extract a set of top N u similar users, top
Nl similar locations and top N a similar activities. Then, we use the ratings from these users on the corresponding locations
and activities in a weighted manner to calculate the entry value:
Âi , j ,k =
u∈ R i
4
S u ,i Au , j ,k
u
S u ,i
+
l∈ R j
4
S l, j Ai ,l,k
l
S l, j
+
a∈ R k
4
S a,k Ai , j ,a
a
S a,k
+
u ∈ R i ,l∈ R j ,a∈ R k
4
u ,l,a
S u ,l,a Au ,l,a
S u ,l,a
,
where S u ,i is the similarity for users i and u learned from the user–user matrix B; S l, j is the similarity for locations j and
l learned from the location–feature matrix C and the user–location matrix E by equally combining the cosine similarities
calculated from each; S a,k is the similarity for activities k and a learned from activity–activity matrix D; S u ,l,a is the
similarity between Ai , j ,k and Au ,l,a for some (u , l, a) in the neighboring sets R i , R j , R k of user i, location j and activity k,
respectively. It’s designed as
S u ,l,a = 1/ (1/ S u ,i )2 + (1/ S l, j )2 + (1/ S a,k )2 .
In the experiments, we set N u = Nl = N a = 4, as similar to the previous cases.
The fifth baseline, SCF, is a model-based model employed to compare with our algorithm CLAF [10]. Similarly, it also
takes the location–activity matrix A as input for CF. The model aims to find the latent location factor x and latent activity
factor y that minimize the loss in Eq. (6).
The last baseline, POIC, is a baseline that uses the POI counts on each location to generate the ranking results for both
location and activity recommendation. In particular, we count the number of POIs in each location for each activity category.
Then, we normalize the counts to [0, 1] and use them to give the rankings. Generally, if a location has more restaurant and
bar POIs than the other types of POIs, then it is assumed to be more suitable for activity of “food and drink”, regardless of
what the mobile users really do there.
Results. The comparison results are shown in Table 7. We report two settings of results here: using 30% of data for training and using 50% of data for training. As we can see in both settings, our algorithms generally outperform the baselines,
showing the effectiveness of our models. Note that, as our PCLAF’s objective function is based on square loss (and it well integrates the other auxiliary information), it has the lowest RMSE values among all the algorithms. As opposed to square loss,
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
31
Table 7
Comparison with baselines, with different percentages of data used for training.
RMSE
AUC loc
AUC act
Percent
30%
50%
30%
50%
30%
50%
CLAF
PCLAF
RPCLAF
UCF
LCF
ACF
ULACF
SCF
POIC
0.35 ± 0.02
0.30 ± 0.01
0.36 ± 0.00
0.42 ± 0.01
0.43 ± 0.01
0.47 ± 0.01
0.47 ± 0.01
0.39 ± 0.05
0.49 ± 0.02
0.36 ± 0.03
0.29 ± 0.01
0.34 ± 0.02
0.38 ± 0.01
0.37 ± 0.02
0.58 ± 0.03
0.43 ± 0.01
0.38 ± 0.02
0.49 ± 0.02
0.73 ± 0.03
0.74 ± 0.02
0.80 ± 0.03
0.65 ± 0.01
0.62 ± 0.01
0.63 ± 0.02
0.73 ± 0.02
0.70 ± 0.02
0.71 ± 0.01
0.80 ± 0.03
0.80 ± 0.01
0.83 ± 0.02
0.75 ± 0.02
0.74 ± 0.02
0.72 ± 0.01
0.80 ± 0.02
0.78 ± 0.04
0.76 ± 0.03
0.70 ± 0.04
0.83 ± 0.03
0.85 ± 0.05
0.59 ± 0.01
0.71 ± 0.01
0.58 ± 0.01
0.76 ± 0.02
0.67 ± 0.07
0.65 ± 0.01
0.79 ± 0.06
0.84 ± 0.04
0.92 ± 0.05
0.73 ± 0.02
0.83 ± 0.03
0.70 ± 0.02
0.85 ± 0.05
0.75 ± 0.06
0.75 ± 0.02
Table 8
Impact of user numbers, in terms of RMSE.
RMSE
#(user)
60
90
119
CLAF
PCLAF
RPCLAF
UCF
LCF
ACF
ULACF
SCF
POIC
0.37 ± 0.02
0.33 ± 0.02
0.38 ± 0.02
0.42 ± 0.02
0.43 ± 0.01
0.49 ± 0.01
0.50 ± 0.04
0.40 ± 0.04
0.48 ± 0.02
0.36 ± 0.02
0.31 ± 0.02
0.35 ± 0.02
0.42 ± 0.02
0.41 ± 0.01
0.47 ± 0.02
0.49 ± 0.03
0.39 ± 0.06
0.48 ± 0.02
0.35 ± 0.02
0.30 ± 0.02
0.35 ± 0.02
0.41 ± 0.02
0.41 ± 0.03
0.46 ± 0.01
0.49 ± 0.01
0.37 ± 0.03
0.48 ± 0.02
our RPCLAF’s objective function is ranking-oriented, therefore its AUC performances on location and activity recommendations are shown to be the best through experiments. Our PCLAF and RPCLAF outperform CLAF, implying that personalization
can be useful in recommendation. Besides, SCF can be seen as a special case of our CLAF algorithm, given that the auxiliary
location and activity information is not used. Therefore, its performance is close to that of CLAF.
One interesting question to ask is whether we can simply make recommendation based on the POI counts, ignoring
the user data. We see such an approach as a useful baseline, but we may not expect it to work as well as our algorithms
due to several reasons. First, not using the user data makes us miss the chance of better understanding each single user’s
preferences for recommendation. For example, if we do not know a user likes going to do some gym after work, we may
just recommend to her to enjoy some food rather than exercise around the area. Second, POI counts do not necessarily
reflect the POI popularity. For example, we may see one place has only one or two restaurants, but they are both very
nice, and thus attract a lot of customers to go there. If we do not consider the user data, we may not be able to discover
such popularity information and use it for recommendation. Therefore, as we can see from Table 7, POIC baseline is quite
competitive but still worse than our models.
It is worth noting that in Table 7, the best RMSE value we achieved is 0.29 (using PCLAF with 50% of ratings as training
data). As the rating data used in the experiments are normalized to be in the range of [0, 1], such an RMSE value is not very
good in fact. This shows that, the mobile recommendation problem is essentially a challenging problem: (i) the training data
are usually limited (e.g. 50% of ratings only take up 1.7% of the tensor entries); (ii) the exact user rating pattern (i.e. how
many times exactly a user performed some activity at some location) is not easy to predict. Because of these reasons, we
develop the RPCLAF algorithm, which turns the exact rating prediction into preference prediction. In this way, our expected
output is more consistent with the ranking nature of recommendation problem; and also, we can better utilize the limited
amount of data by considering the additional pairwise preference.
6.4. Impact of user numbers
To evaluate the impact of the user number, we vary the number of users in building recommendation systems. Specifically, in this experiment, we randomly pick a fixed set of 60 users as testing users, and then change the number of training
user from 60 to 119. We run the experiments five times, and report the results in terms of RMSE (see Table 8) and AUC (see
Table 9). In general, as the number of user increases, the performances in terms of RMSE and AUC (both AUCloc and AUC act )
increase. Besides, we also notice that, the performance improvement tends to diminish as training user numbers increases,
implying that the performances for the specific set of test data tend to saturate.
32
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
Table 9
Impact of user numbers, in terms of AUC.
AUC loc
AUC act
#(user)
60
90
119
60
90
119
CLAF
PCLAF
RPCLAF
UCF
LCF
ACF
ULACF
SCF
POIC
0.70 ± 0.02
0.72 ± 0.03
0.75 ± 0.05
0.63 ± 0.03
0.60 ± 0.02
0.60 ± 0.04
0.68 ± 0.04
0.68 ± 0.05
0.69 ± 0.02
0.72 ± 0.01
0.74 ± 0.02
0.78 ± 0.05
0.63 ± 0.02
0.64 ± 0.04
0.64 ± 0.04
0.72 ± 0.03
0.71 ± 0.03
0.69 ± 0.02
0.72 ± 0.01
0.74 ± 0.02
0.78 ± 0.03
0.65 ± 0.03
0.64 ± 0.02
0.65 ± 0.03
0.71 ± 0.01
0.71 ± 0.03
0.69 ± 0.02
0.66 ± 0.07
0.80 ± 0.06
0.82 ± 0.08
0.56 ± 0.09
0.70 ± 0.06
0.58 ± 0.03
0.76 ± 0.02
0.64 ± 0.06
0.63 ± 0.04
0.69 ± 0.06
0.82 ± 0.07
0.84 ± 0.06
0.59 ± 0.02
0.71 ± 0.05
0.59 ± 0.02
0.79 ± 0.03
0.67 ± 0.05
0.63 ± 0.04
0.69 ± 0.02
0.83 ± 0.04
0.85 ± 0.06
0.60 ± 0.02
0.72 ± 0.04
0.60 ± 0.06
0.79 ± 0.05
0.68 ± 0.03
0.63 ± 0.04
Fig. 11. Impact of latent factor dimension d.
Table 10
Impact of θ to RPCLAF.
θ
θ
θ
θ
= e 0 .1
= e1
= e2
= e3
RMSE
AUC loc
AUC act
0.36 ± 0.04
0.37 ± 0.01
0.35 ± 0.03
0.35 ± 0.01
0.79 ± 0.02
0.79 ± 0.03
0.76 ± 0.02
0.74 ± 0.02
0.87±0.06
0.85±0.08
0.82±0.07
0.80±0.05
6.5. Impact of model parameters
We also study the impact of the model parameters in our three algorithms, including λi (i = 1, . . . , 5) in PCLAF and
RPCLAF, and β j ( j = 1, 2, 3) in CLAF. In general, λ1 controls the contribution of the user similarity input; λ2 and β1 controls
the contribution of location features; λ3 and β2 control the contribution of activity correlations; λ4 controls the contribution
of user–location preferences; λ5 and β3 control the regularization. For each parameter, we vary its value from 0.01 to 10,
and fix the other parameters (e.g. with value of 0.1). Then, we run the experiments five times, and report the average RMSE
and AUC scores in Fig. 12. As shown in the figure, in general, the parameter values falling into [0.1, 1] tend to give better
performances, showing that reasonable weights are preferred on the these additional inputs for optimization. Besides, for
activity correlations in Figs. 12(e) and 12(f), higher values for parameter λ3 (and β2 ) tend to give better results. This is
possibly because, as opposed to other inputs, activity correlations have limited size (since the number of activities is much
smaller than the number of users and the number of locations). Therefore, in order to encode this correlation constraint,
we may need a higher weight in the objective function.
Similarly, in Fig. 11, we vary the latent factor dimension d from 2 to 4 (as the minimal dimension in the tensor is 5,
i.e. the number of activities), and report the averaged RMSE and AUC scores. In general, under different model parameters,
RPCLAF is better than PCLAF and CLAF in terms of AUC scores; in contrast, PCLAF is better than CLAF and RPCLAF in terms
of RMSE.
For RPCLAF, we also have a model parameter θ that controls the probability of rating ties in the logistic sigmoid function
σ (x) = 1+θ1e−x . We study its impact to the performance of RPCLAF, and report the results in Table 10. From the table, we
see that a bigger θ tends to pose a stronger constraint on modeling the rating ties, and too strong constraint may lead to
performance drop.
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
Fig. 12. Impact of model parameters, where “CLAF (loc)” (or, “CLAF (act)”) in the plots indicates the AUC loc (or, AUC act ) score for CLAF algorithm.
33
34
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
Table 11
Impact of AUC tolerance parameter
, with 30% of data used for training.
AUC loc
AUC act
0.05
0.1
0.2
0.05
0.1
0.2
CLAF
PCLAF
RPCLAF
UCF
LCF
ACF
ULACF
SCF
POIC
0.68 ± 0.02
0.70 ± 0.01
0.77 ± 0.03
0.63 ± 0.04
0.61 ± 0.03
0.58 ± 0.02
0.70 ± 0.02
0.67 ± 0.01
0.66 ± 0.03
0.73 ± 0.01
0.74 ± 0.02
0.79 ± 0.03
0.65 ± 0.03
0.62 ± 0.04
0.58 ± 0.05
0.73 ± 0.02
0.71 ± 0.01
0.71 ± 0.01
0.79 ± 0.03
0.81 ± 0.02
0.85 ± 0.02
0.65 ± 0.01
0.65 ± 0.04
0.64 ± 0.02
0.75 ± 0.02
0.77 ± 0.03
0.71 ± 0.01
0.60 ± 0.04
0.68 ± 0.03
0.70 ± 0.05
0.59 ± 0.01
0.68 ± 0.03
0.57 ± 0.01
0.73 ± 0.01
0.62 ± 0.05
0.59 ± 0.02
0.67 ± 0.02
0.83 ± 0.03
0.85 ± 0.08
0.60 ± 0.04
0.71 ± 0.05
0.63 ± 0.03
0.79 ± 0.02
0.68 ± 0.01
0.65 ± 0.01
0.79 ± 0.07
0.92 ± 0.04
0.95 ± 0.04
0.62 ± 0.02
0.73 ± 0.04
0.64 ± 0.03
0.81 ± 0.01
0.76 ± 0.06
0.71 ± 0.01
Table 12
Comparison with trivial recommender, with different percentages of data used for training.
RMSE
AUC loc
AUC act
Percent
30%
50%
30%
50%
30%
50%
CLAF
PCLAF
RPCLAF
Trivial
0.35 ± 0.02
0.30 ± 0.01
0.36 ± 0.00
0.27 ± 0.00
0.36 ± 0.03
0.29 ± 0.01
0.34 ± 0.02
0.26 ± 0.01
0.79 ± 0.03
0.81 ± 0.02
0.85 ± 0.02
0.75 ± 0.01
0.86 ± 0.02
0.88 ± 0.01
0.90 ± 0.03
0.79 ± 0.00
0.79 ± 0.07
0.92 ± 0.04
0.96 ± 0.03
0.94 ± 0.02
0.90 ± 0.06
0.96 ± 0.03
0.98 ± 0.03
0.95 ± 0.00
6.6. Impact of AUC parameter
In our AUC score, we have a tolerance parameter to measure the prediction ties, so that two predictions having a
difference less than are seen to be a tied order in ranking. We study its impact in evaluating all the algorithms. As
shown in Table 11, as increases, the AUC scores tend to be higher. That is because, with higher tolerance, the accuracy of
predicting the tied order becomes higher. Besides, it is also shown that our algorithms, especially RPCLAF, can consistently
outperform the baselines.
6.7. Comparison with a trivial recommender
We also compare our three algorithms with a trivial recommender, which always uses the average non-zero rating
values of training data as the prediction. This baseline is interesting w.r.t. our data characteristics. Our data have many
non-zero rating values as “1”, with the global mean rating value of 1.43 and standard deviation of 0.70 (before rating
values are normalized to [0, 1]). Such a data property may benefit the trivial recommender. In the experiments, we set
λ1 = λ2 = λ4 = λ5 = 0.1, λ3 = 1, β1 = β3 = 0.1, β2 = 1, k = 4, θ = e 1 , = 0.2. As shown in Table 12, in most of the time,
our proposed algorithms can outperform the trivial recommender in terms of AUC scores. Our algorithms work better than
the trivial recommender especially in location recommendation. This is because the rating variance is generally bigger, given
that the location number is bigger than the activity number. The trivial recommender seems to work well in terms of RMSE,
by benefiting from the relatively small standard deviation in the rating values. But the best RMSE results our algorithms can
achieve are comparable. Finally, we note that though the trivial recommender seems to work well by benefiting from our
dataset’s property (i.e. with many ratings of “1”), we expect our algorithm to generalize well to other datasets that are not
necessarily biased to some rating values.
6.8. Discussion
It is worth noting that, like most of the existing collaborative filtering work [10,9,17,21], our proposed algorithms do not
make any specific assumption on how the missing data are generated. It is generally believed in the collaborative filtering
literatures that the values are missing at random. In other words, a rating that is missing does not depend on the value of
that rating, or the value of any other missing ratings. However, some recent research such as [24] points out that values
are not necessarily missing at random. Consider our problem: suppose that a user’s preference of doing some activity a at
a location l is low, then we are unlikely to have collected data on this pattern. As a result, the missing data in our sample
are biased towards “low-rated” user–location–activity entries. This may possibly skew the hold-out evaluation. Marlin and
Zemel proposed to formulate the missing data generation with a probabilistic mixture model to address this problem [24].
In the future, we are interested in extending our algorithms along this line.
Another interesting problem, yet to be studied more in the future, is that our way to generate the ratings is different
from traditional rating system. Recall that we define a user–location–activity rating as the mention count value in Eq. (3).
Because the users may have omitted some instances of a particular activity at a particular location from their comments,
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
35
the ratings we get in our sample could be lower than the “true” value. This can also lead to some missing data. In the
future, we are also interested in further exploiting this issue.
7. Related work
In the past, little work studying collaborative location and activity recommendations has been done. Most of the previous
work focused on either recommending some specific types of locations [25–28], or only recognizing the user activities from
sensor data rather than providing location and activity recommendations together [29,30].
7.1. Location recommendation
Location recommendation has been an important topic in geo-related services. Some systems, based on an individual
user’s current location, retrieve important surrounding locations and their contexts for recommendations. For example, in
[31], a mobile application framework, which enables a mobile phone user to query the geo-coded Wikipedia articles for
landmarks in the vicinity, is presented. In [32], a Cyberguide system is developed to provide the librarian information which
describes the nearby buildings and related people identities. Comparatively, our system exploits the user location histories
and recommends the interesting locations all round the city instead of only nearby locations.
There are some systems focusing on recommending some specific types of locations. For example, in [25], a CityVoyager
system is developed to recommend shops. It collects the users’ shopping histories based on GPS logs, and uses an item-based
collaborative filtering method to recommend to a user some shops that are similar to his/her previously visited shops. In
[27], a system considering both users’ preferences and location contexts is shown to recommend restaurants. It uses Bayesian
learning to calculate some recommendation values for restaurants so as to provide a ranking list for recommendation.
Similarly, in [26], a Geowhiz system, which uses a user-based collaborative filtering algorithm to recommend restaurants,
is proposed. In [33], the recommended locations are hot spots for tourism. A HITS-based model is proposed to take into
account a user’s travel experience and the interest of a location in recommendation, so that only the locations that are
really popular and also recommended by experienced users can be recommended. In contrast to those systems limited to
modeling only one type of location for recommendations, our system is capable of handling various types of locations. That
is, we can recommend locations not only for food and drinks but also for shopping, and so on.
7.2. Activity recommendation
Activity recommendation is a pretty new research issue with little research done on it so far [34]. Yet it is a quite
common question in our daily life to ask what we can do if we want to visit some place. Most of the previous work related
to the study focuses on how to recognize an activity from various sensor data such as GPS [35], RFID [36], motion sensor
[37] or WiFi [38] by ubiquitous computing [16].
Early activity recognition algorithms are based on logic and usually described as a logical inference process w.r.t. a set
of first-order statements [39]. However, with the development of the sensor technology, these logic-based approaches were
found generally limited in modeling uncertainty and noise of the sensor data. As a result, the learning-based algorithms
were introduced to model the relationships between the sensor observations and the activities in a sophisticated way by
machine learning. For example, in [29], the Hidden Markov Model is used to model the sequential object sensor observations
for fine-grained activity recognition. While in [30], a supervised decision tree is proposed to recognize ADLs. Notice that,
most of these studies only consider fine-grained activity recognition in indoor environments, and they did not consider
using user location histories to model a user’s activities in outdoor environments. In this paper, we show how to parse user
GPS location data, and use them together with the mined location features and activity correlations to provide both indoor
and outdoor coarse-grained activity recommendations w.r.t. location queries.
Some other work related to outdoor activity recognition includes [35,40,41]. For example, in [35], based on GPS data,
a supervised hierarchical conditional random field model is used to recognize whether a user is at work, sleeping at home,
or visiting friends, and so on. Both the studies in [40] and [41] are based on a reality mining project in MIT, which uses
mobile phones as the sensors for recording the user’s movements and social behaviors. Unsupervised learning algorithms,
such as Principle Component Analysis (PCA), Latent Dirichlet Allocation (LDA) and Author Topic model (ATM), are applied to
the user’s location data to discover the frequent patterns of user’s activities. Compared with these studies, our work not only
predicts what kind of activities are suitable for some location, but also well integrates it with the location recommendations.
8. Conclusion
In this paper, we studied how to use real-world GPS data to retrieve relevant mobile information for answering two
typical questions. The first question is, if we want to do something, where shall we go? This question corresponds to
location recommendation. The second question is, if we visit some place, what can we do there? This question corresponds
to activity recommendation. We show that these two questions are inherently related, as they can be seen as a collaborative
filtering problem in a user–location–activity rating tensor. We propose three algorithms to solve this problem. The first one,
36
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
CLAF, is a matrix-based CF model which aims to minimize the square loss of missing entry value predictions in the location–
activity matrix (without modeling user) [3]. The second one, PCLAF, is a tensor-based CF model which aims to minimize the
square loss of missing entry value predictions in the user–location–activity tensor [4]. Compared with CLAF, PLCAF takes
user into account for optimization and thus is able to provide personalized recommendation. The third one, RPCLAF, is a
tensor-based CF model which aims to minimize the ranking loss on missing entries in the user–location–activity tensor.
Compared with CLAF and PCLAF, this newly proposed RPCLAF model considers recommendation as a ranking problem and
thus focused on directly optimizing the ranking performance.
Because the user–location–activity tensor is very sparse in practice, we also propose to exploit other information, including user–user similarities, location features, activity–activity correlations and user–location visiting preferences from various
information sources, to enhance the performance. We extensively evaluated our system on a real-world GPS dataset. We
show that, our three algorithms can consistently outperform six competing baselines. Particularly, on average,7 our newly
proposed RPCLAF algorithm can achieve at least 7% improvement on location recommendation (in terms of AUC score) and
10% improvement on activity recommendation, compared with the best performances of all these six baselines. Besides, on
average, our RPCLAF algorithm also achieves at least 6% improvements on location recommendations and 6% improvements
on activity recommendations, compared with the best performances of our two previous algorithms CLAF and PCLAF.
In the future, we will consider more external information, such as incorporating the time or sequence information of
the trajectories to provide more constraints in the recommendations. Besides, we are also interested in studying how to
update our models in an online fashion as more users accumulate data continuously. Meanwhile, we are also interested in
integrating our models with cloud computing platforms so as to handle the large number of users.
Acknowledgements
We thank Hong Kong RGC project 621010 for supporting the research. We also thank the anonymous reviewers for their
helpful comments and constructive suggestions.
References
[1] P. Symeonidis, A. Nanopoulos, Y. Manolopoulos, Tag recommendations based on tensor dimensionality reduction, in: Proc. of the ACM Conference on
Recommender Systems, 2008, pp. 43–50.
[2] A. Cichocki, R. Zdunek, A.H. Phan, S.-i. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multiway Data Analysis and
Blind Source Separation, Wiley, 2009.
[3] V.W. Zheng, Y. Zheng, X. Xie, Q. Yang, Collaborative location and activity recommendations with gps history data, in: Proc. of the 19th International
World Wide Web Conference (WWW ’10), ACM, New York, NY, USA, 2010.
[4] V.W. Zheng, B. Cao, Y. Zheng, X. Xie, Q. Yang, Collaborative filtering meets mobile recommendation: A user-centered approach, in: Proc. of the 24th
AAAI Conference on Artificial Intelligence (AAAI’10), Atlanta, Georgia, USA, 2010, pp. 236–241.
[5] Y.-F. Chen, G. Di Fabbrizio, D. Gibbon, R. Jana, S. Jora, B. Renger, B. Wei, GeoTV: navigating geocoded RSS to create an IPTV experience, in: Proc. of the
16th International Conference on World Wide Web (WWW ’07), 2007.
[6] L. Liao, D. Fox, H.A. Kautz, Learning and inferring transportation routines, Artificial Intelligence (2007) 311–331.
[7] Y. Zheng, X. Zhou (Eds.), Computing with Spatial Trajectories, Springer, 2011.
[8] Y. Zheng, X. Xie, W.-Y. Ma, GeoLife, A collaborative social networking service among user, location and trajectory, IEEE Database Eng. Bull. (2010).
[9] J. Wang, A.P. de Vries, M.J.T. Reinders, Unifying user-based and item-based collaborative filtering approaches by similarity fusion, in: Proc. of the 29th
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’06), 2006, pp. 501–508.
[10] N. Srebro, T. Jaakkola, Weighted low-rank approximations, in: Proc. of the 21st International Conference on Machine Learning (ICML ’03), 2003, pp. 720–
727.
[11] Y. Zheng, L. Liu, L. Wang, X. Xie, Learning transportation mode from raw gps data for geographic applications on the web, in: Proc. of the 17th
International Conference on World Wide Web (WWW ’08), 2008, pp. 247–256.
[12] Q. Li, Y. Zheng, X. Xie, Y. Chen, W. Liu, W.-Y. Ma, Mining user similarity based on location history, in: Proc. of the 16th ACM SIGSPATIAL International
Conference on Advances in Geographic Information Systems (GIS ’08), 2008, pp. 1–10.
[13] M. Ankerst, M.M. Breunig, H.-P. Kriegel, J. Sander, Optics ordering points to identify the clustering structure, SIGMOD Rec. 28 (2) (1999) 49–60.
[14] K. Nigam, A.K. McCallum, S. Thrun, T. Mitchell, Text classification from labeled and unlabeled documents using EM, Mach. Learn. 39 (2000) 103–134.
[15] C.D. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008.
[16] V.W. Zheng, D.H. Hu, Q. Yang, Cross-domain activity recognition, in: Proc. of the 11th International Conference on Ubiquitous Computing (UbiComp
’09), 2009, pp. 61–70.
[17] A.P. Singh, G.J. Gordon, Relational learning via collective matrix factorization, in: Proc. of the 14th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD ’08), 2008, pp. 650–658.
[18] G. Adomavicius, R. Sankaranarayanan, S. Sen, A. Tuzhilin, Incorporating contextual information in recommender systems using a multidimensional
approach, ACM Trans. Inf. Syst. 23 (2005) 103–145.
[19] N.N. Liu, M. Zhao, Q. Yang, Probabilistic latent preference analysis for collaborative filtering, in: Proc. of the 18th ACM Conference on Information and
Knowledge Management, CIKM ’09, ACM, New York, NY, USA, 2009, pp. 759–766.
[20] Y. Hu, Y. Koren, C. Volinsky, Collaborative filtering for implicit feedback datasets, in: Proc. of the 8th IEEE International Conference on Data Mining,
ICDM ’08, IEEE Computer Society, Washington, DC, USA, 2008, pp. 263–272.
[21] S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-thieme, BPR: Bayesian personalized ranking from implicit feedback, in: Proc. of the 25th Conference
on Uncertainty in Artificial Intelligence, UAI ’09, 2009.
[22] K. Zhou, G.-R. Xue, H. Zha, Y. Yu, Learning to rank with ties, in: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval, SIGIR ’08, ACM, New York, NY, USA, 2008, pp. 275–282.
7
Under different percentages of training data used.
V.W. Zheng et al. / Artificial Intelligence 184–185 (2012) 17–37
37
[23] J.L. Herlocker, J.A. Konstan, A. Borchers, J. Riedl, An algorithmic framework for performing collaborative filtering, in: Proc. of the 22nd Annual ACM
SIGIR Conference, SIGIR ’99, ACM, New York, NY, USA, 1999, pp. 230–237.
[24] B.M. Marlin, R.S. Zemel, Collaborative prediction and ranking with non-random missing data, in: Proceedings of the Third ACM Conference on Recommender Systems, RecSys ’09, ACM, New York, NY, USA, 2009, pp. 5–12.
[25] Y. Takeuchi, M. Sugimoto, CityVoyager: An outdoor recommendation system based on user location history, in: Proc. of Ubiquitous Intelligence and
Computing, 2006, pp. 625–636.
[26] T. Horozov, N. Narasimhan, V. Vasudevan, Using location for personalized POI recommendations in mobile environments, in: Proc. of the International
Symposium on Applications on Internet, 2006, pp. 124–129.
[27] M.-H. Park, J.-H. Hong, S.-B. Cho, Location-based recommendation system using Bayesian user’s preference model in mobile devices, in: Proc. of
Ubiquitous Intelligence and Computing, 2007, pp. 1130–1139.
[28] H. Yoon, Y. Zheng, X. Xie, W. Woo, Smart itinerary recommendation based on user-generated GPS trajectories, in: Proc. of Ubiquitous Intelligence and
Computing (UIC ’10), 2010.
[29] D.J. Patterson, D. Fox, H.A. Kautz, M. Philipose, Fine-grained activity recognition by aggregating abstract object usage, in: Proc. of the 9th IEEE International Symposium on Wearable Computers (ISWC ’05), IEEE Computer Society, Washington, DC, USA, 2005, pp. 44–51.
[30] M.R. Hodges, M.E. Pollack, An ’object-use fingerprint’: The use of electronic sensors for human identification, in: Proc. of the 9th International Conference on Ubiquitous Computing (UbiComp ’07), 2007, pp. 289–303.
[31] R. Simon, P. Frölich, A mobile application framework for the geospatial web, in: Proc. of the 16th International Conference on World Wide Web (WWW
’07), 2007.
[32] G. Abowd, C. Atkeson, J. Hong, S. Long, R. Kooper, M. Pinkerton, Cyberguide: a mobile context-aware tour guide, Wirel. Netw. (1997) 421–433.
[33] Y. Zheng, L. Zhang, X. Xie, W.-Y. Ma, Mining interesting locations and travel sequences from GPS trajectories, in: Proc. of the 18th International
Conference on World Wide Web (WWW ’09), 2009, pp. 791–800.
[34] V. Bellotti, B. Begole, E.H. Chi, N. Ducheneaut, J. Fang, E. Isaacs, T. King, M.W. Newman, K. Partridge, B. Price, P. Rasmussen, M. Roberts, D.J. Schiano,
A. Walendowski, Activity-based serendipitous recommendations with the Magitti mobile leisure guide, in: Proc. of the 26th Annual SIGCHI Conference
on Human Factors in Computing Systems, CHI ’08, ACM, New York, NY, USA, 2008, pp. 1157–1166.
[35] L. Liao, D. Fox, H.A. Kautz, Location-based activity recognition, in: Proc. of Advances in Neural Information Processing Systems (NIPS ’05), 2005.
[36] D. Wyatt, M. Philipose, T. Choudhury, Unsupervised activity recognition using automatically mined common sense, in: Proc. of the Twentieth National
Conference on Artificial Intelligence (AAAI ’05), 2005, pp. 21–27.
[37] S.S. Intille, K. Larson, E.M. Tapia, J. Beaudin, P. Kaushik, J. Nawyn, R. Rockinson, Using a live-in laboratory for ubiquitous computing research, in: Proc.
of the 4th International Conference on Pervasive Computing (Pervasive ’06), 2006, pp. 349–365.
[38] J. Yin, Q. Yang, J.J. Pan, Sensor-based abnormal human-activity detection, IEEE Trans. Knowl. Data Eng. 20 (8) (2007) 17–31.
[39] H. Kautz, A formal theory of plan recognition, Ph.D. thesis, University of Rochester, 1987.
[40] K. Farrahi, D. Gatica-Perez, What did you do today?: Discovering daily routines from large-scale mobile data, in: Proc. of the 16th ACM International
Conference on Multimedia (ACM MM ’08), 2008, pp. 849–852.
[41] N. Eagle, A. Pentland, Eigenbehaviors: Identifying structure in routine, Behav. Ecol. Sociobiol. (2009) 1057–1066.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising