Plenty of companies often boast ‘your data is anonymous’ when there are privacy concerns raised, or their systems are inevitably breached. However, even with just a few points of anonymous and seemingly harmless information, it’s possible identify individuals with an 87% accuracy rate. If the company substitutes your details with a unique ID (like AOL did), this makes it even easier.
“When AOL researchers released a massive dataset of search queries, they first “anonymized” the data by scrubbing user IDs and IP addresses. When Netflix made a huge database of movie recommendations available for study, it spent time doing the same thing. Despite scrubbing the obviously identifiable information from the data, computer scientists were able to identify individual users in both datasets. (The Netflix team then moved on to Twitter users.)
In AOL’s case, the problem was that user IDs were scrubbed but were replaced with a number that uniquely identified each user. This seemed like a good idea at the time, since it allowed researchers using the data to see the complete list of a person’s search queries, but it also created problems; those complete lists of search queries were so thorough that individuals could be tracked down simply based on what they had searched for. As Ohm notes, this illustrates a central reality of data collection: “data can either be useful or perfectly anonymous but never both.”