A. Contribution

  1. Problem addressed by the paper

Proving the possibility of information leakage from Facebook users’ interest.

  1. Solution proposed in the paper. Why is it better than previous work?

The authors proposed inferring users’ undisclosed (private) attributes using the public attributes of other users sharing similar interests. Most of previous works use the information of friendships or group belongings in order to achieve these goals. By contrast, this paper only relies on users’ interests. This paper provides a new approach based on semantic knowledge in order to demonstrate information leakage through user interests. Moreover, all previous works relied on private datasets (e.g., dataset of a private community such as a university), and assumed a different attacker model.

  1. The major results.

Their experiments, based on more than 104K public profiles collected from Facebook and more than 2000 private profiles provided by volunteers, show that proposed inference technique efficiently predicts attributes that are very often hidden by users.

B. Basic idea and approach. How does the solution work?

  1. Creating Interest Descriptions: Interest descriptions are the user-specified interest names augmented with semantically related words which are mined from the Wikipedia ontology.
  2. Extracting semantic correlation between interest descriptions using Latent Dirichlet Allocation (LDA). The output represents a set of topics containing semantically related concepts.
  3. Computing Interest Feature Vectors (IFV). Based on the discovered topics, LDA also computes the probability that an interest I belongs to Topici for all I and i (Step 3a). Then, we derive the IFV of each user (Step 3b) which quantifies the interest of a user in each topic.
  4. Computing the neighbors of each user in the feature space (i.e., whose IFVs are similar in the feature space) to discover similar users, and exploiting this neighborhood to infer hidden attributes.


C. Strengths

  1. It is among first works that uses users’ interests for profiling.
  2. It can be used by spammer to send targeted ad that works efficiently to targeted users.
  3. There is no need for frequent model updates because the result is quite stable.
  4. It uses free, open, and updated encyclopedia.

D. Weaknesses

  1. The accuracy is still low. It can be further improved by combining it with other information/attributes to create more accurate inference algorithm.
  2. In the current version of the paper, if some users declared non-interest/dislike, it will falsely be interpreted as an interest. For instance, an interest can be created with the title “I hate Michael Jackson”. The paper’s semantics-driven classification will falsely identify the users having this interest as “Michael Jackson” fans.
  3. It does not capture the intensity of users’ interests. For example, a user likes A and B. However, we can infer more accurately by taking into account if this user likes A more than B or vice versa.
  4. It might still fail when the targeted users’ profiles are outliers of other users’ profiles for the same interests.
  5. It does not offer mitigation technique for users to protect their account from information leakage through their interests. For instance, they should make their interests to be only visible by friends. Or suggests mitigation technique that can be done by Facebook to protect its users’ privacy. For example, attributes that can be used for privacy inference are made visible to Friends only by default.

E. Future work, Open issues, possible improvements

  1. Could still be further improved by using Multilanguage Wikipedia to handle foreign language.