IIS 03-24851
Add new comment
Risks to K-Anonymity in Recommender Systems Datasets
Submitted by presnick on Thu, 2006-07-13 23:25. ProjectsHow secure is personal identity in de-identified data sets?
- To what extent can users be re-identified between the publicly available datasets. The results show that even when the movie identification from the forums is performed with very simple text analysis algorithms 31% of users can be 1-identified; with hypothetical more sophisticated text analysis, we estimate more than 40% of users could be 1-identified.
- How much of a database must be redacted to prevent 1-identification? The results show that more than 80% of the low popularity items must be removed from the database to reduce 1-identification to near zero. (These 80% of the items only represent about 20% of the total ratings in the dataset, because each of the items has few ratings.)
- Can a user protect herself from 1-identification through careful choice of the items she mentions in the forums? Here the results are mixed. The simple approach of not mentioning some movies is relatively ineffective: about 30% of the movies a user might like to discuss must be left out of the postings. On the other hand, the more subtle approach or misdirecting by mentioning movies that other users find interesting does work more easily. There is some question about the ethics of this approach, because it will redirect the re-identification to some other user.
Project members:
Title ![]() | Authors | Appears In | Publication Date | Date added |
|---|---|---|---|---|
| You Are What You Say: Privacy Risks of Public Mentions | Frankowski, D., Cosley, D., Sen, S., Terveen, L., Riedl, J. | Proceedings of SIGIR 2006 | 2006 | 07.13.06 |

