Apparently, “anonymized” clickstream data (the urls of which websites you visited and in what order) is available for sale directly from many ISPs. There is no way that this is sufficiently anonymized. It is readily obvious from reading my clickstream who I am – urls for MANY online services contain usernames, and anyone who uses any sort of online service is almost certainly visiting their own presence far more than anything else. All it takes is one of those usernames to be tied to a real name, and your entire clickstream becomes un-anonymized, irreversibly and forever.
I’ve talked about the dangers of breaking anonymization with leaking keys before:
Short answer: It is not enough to say that a piece of data is not “personally identifiable” if it is unique and exists with a piece of personally identifiable data somewhere else. More importantly, it doesn’t even have to be unique or completely personally identifiable – whether or not you can guess who a person is from a piece of data is not a black and white distinction, and simply being able to guess who a person might be can leak some information that might confirm their identity when combined with something else.
Preserving anonymity is hard. This is an egregious breach of privacy. Expect lawsuits if this is true.