1. Information about the paper
Noulas, Anastasios, Salvatore Scellato, Neal Lathia, and Cecilia Mascolo. “Mining user mobility features for next place prediction in location-based services.” In Data Mining (ICDM), 2012 IEEE 12th International Conference on, pp. 1038-1043. IEEE, 2012.
Mobile location-based services are thriving, providing an unprecedented opportunity to collect fine grained spatiotemporal data about the places users visit. This multi-dimensional source of data offers new possibilities to tackle established research problems on human mobility, but it also opens avenues for the development of novel mobile applications and services. In this work we study the problem of predicting the next venue a mobile user will visit, by exploring the predictive power offered by different facets of user behavior. We first analyze about 35 million check-ins made by about 1 million Foursquare users in over 5 million venues across the globe, spanning a period of five months. We then propose a set of features that aim to capture the factors that may drive users’ movements. Our features exploit information on transitions between types of places, mobility flows between venues, and spatio-temporal characteristics of user check-in patterns. We further extend our study combining all individual features in two supervised learning models, based on linear regression and M5 model trees, resulting in a higher overall prediction accuracy. We find that the supervised methodology based on the combination of multiple features offers the highest levels of prediction accuracy: M5 model trees are able to rank in the top fifty venues one in two user check-ins, amongst thousands of candidate items in the prediction list.
2. My review of the paper
This paper tried to infer next place prediction by mining Foursquare check-ins data set. The motivation or idea behind this paper is, that there are multiple factors that acted synchronously to motivates a user’s next visited place. It focuses prediction on next check-in between 24 hours time frame from previous check-ins.
The method is summarized as follows:
- Collecting check-in data by crawling publicly available check-in data posted on users’ Twitter account. The authors successfully collected 5 million geotagged venues from 35 million user check-ins in period of 5 months.
- Rank the next likely to be visited venues from personalized view, by measuring weight from:
- historical visit: number of past visit to certain place.
- categorical preference: identifying importance of place type or function, such as cinema, restaurant, mall, and others.
- Social filtering: a set of places visited by friends of the user.
- Rank the next likely to be visited venues from global/general view, by measuring weight from:
- geographic distance
- rank distance
- activity transition
- place transition
- Rank temporal features which are:
- category hour
- category day
- place day
- place hour
- then it predicts the highest ranked place by combining those factors. The result is then evaluated using percentile rank, average percentile rank, and prediction accuracy.
- This paper introduces new approach in mining user visit prediction by combining multiple factors to rank the next place most likely to be visited.
- It has been cited by 38 following works since published at 2012, which is a good number.
- Some limitations are already stated in the paper:
- Prediction accuracy is dropping during weekend, because people most likely do something different to enjoy their weekend.
- Predictions are more accurate at noon and morning, at 0.65; and less accurate at night/evening where the accuracy drops almost by 25%.
- Table 1 which shows accuracy result from evaluation, should be put in page 4 (page 1041) where it is discussed, not on previous page where the authors barely discuss about theoretical ranking algorithm. That way, the paper will be more user friendly to read. Although it would not take much effort to go back and forth between pages. It also would not take much effort to move the table to the next page.
- The paper should explain a little more about what is M5 tree? Or why M5 tree is used instead of other learners available out there. It may helps readers who are not familiar with machine learning. It does refers to previous work that explain about it. But I think it would not hurt to tell a little more about it.
- From my point of view and observation to my friends, some technology writers also discussed that, to some extent, FourSquare users are checking-in into well-known places just to show-off or brag about the coolness of going to certain place, for example, the Statue of Liberty. And they do not check-in to less cool places. This could make FourSquare data less relevant to some extent (we do not know the measurement), to be used to predict next place likely to be visited. Well, a user might want to visit cool/popular places visited by friends or others, but a user also might consider some constrains (budget,distance,free time) to do that, and might needs some planning. In other words, it is relevant to be used to predict next place to likely be visited. But it is unlikely to be visited exactly next time, but to be visited some period in the future (after a month for example).
- Letting people know where you are, or letting people know where your home is, can cause privacy leak. It’s like a great tool for stalker.
- There is new idea at http://plancast.com/, where we don’t predict where we will go. It instead let you publish where you will be, so that friends or colleagues or other relevant participants could join and meet you there. This is another interesting approach on mobility pattern that is actually useful to the users.
- It makes sense that the best performer is:
- categorical preference (0.84), because people would not go to places outside of their categorical preference. For example, those who don’t like to work out will unlikely visit a gym.
- place popularity (0.86), it does direct next people’s check-ins. Although it might not happen directly after previous check-ins.
- place hour (0.79), because some venues has limit of opening hours and at night people are most likely at home.
- This method does not count real visit by users. Because people do not check-in every time they switch place. Actually, in the data used in this paper, 50% of users only have fewer than 10 check-ins, and only 10% of places have more than 10 check-ins.
- Back in 2009, when Foursquare launched, declaring your location was a necessity, because phones didn’t have the power to reliably pinpoint a user location, and Foursquare didn’t have much data on what venues were nearby. By 2014, however, both the technology and the data have finally come of age. Now, fourquare is splitted in half. One app called swarm, is like the old foursquare app. And the new foursquare app is dedicated for great place discovery to take on Yelp.