Adam Fields (weblog)

This blog is largely deprecated, but is being preserved here for historical interest. Check out my index page at adamfields.com for more up to date info. My main trade is technology strategy, process/project management, and performance optimization consulting, with a focus on enterprise and open source CMS and related technologies. More information. I write periodic long pieces here, shorter stuff goes on twitter or app.net.

1/30/2006

More specific Google tracking questions

I asked two very specific questions in a conversation with John Battelle, and he’s received unequivocal answers from Google:

1) “Given a list of search terms, can Google produce a list of people who searched for that term, identified by IP address and/or Google cookie value?”

2) “Given an IP address or Google cookie value, can Google produce a list of the terms searched by the user of that IP address or cookie value?”

The answer to both of them is “yes”.

http://battellemedia.com/archives/002283.php

Tags: , , ,


Flickr pictures, web beacons, and a modest proposal

As I noted in the comments of the previous post, I don’t have ads on the site, but I do have flickr pictures directly linked from my flickr account.

It is conceivable to me that flickr pictures could qualify as “web beacons” under the Yahoo privacy policy, and thus be used for tracking purposes. Presumably, this was not the original intention of the flickr developers, but it’s certainly a possibility now that they’re owned by Yahoo. Are the access logs for the static flickr pictures available to Yahoo? Probably. Are they correlated with other sorts of usage information? It’s not clear. Presumably, flickr pictures are linked in places where standard Yahoo web beacons can’t go, because they’re not invited (like on this site, for example).

I think my conclusion is that this is probably not a problem, but maybe it is. It and other sorts of distributed 3rd party tracking all have one thing in common:

It’s called HTTP_REFERER.

Here’s how it works. When you make a request for any old random web page that contains a 3rd party ad or an image or a javascript library or whatever, your browser fetches the embedded piece of content from the 3rd party. When it does that, as part of the request, it sends the URL of the page you visited as part of the request, in a field called the referer header (yes, it’s misspelled).

So, every time you visit a web page:

  • You send the URL to the owner of the page. So far so good.
  • You send your IP address to the owner of the page. Not terrible in itself.
  • You send the URL of the page you visited to the owner of the 3rd party content. And this is where it starts to degrade a little.
  • You send your IP address to the owner of the 3rd party content. The owner of the 3rd party content may be able to set a cookie identifying you. Modern browsers are set by default to refuse 3rd party cookies. However, if that 3rd party has ever set a cookie on your browser before (say, if you hit their site directly), they can still read it. In any case, you can be identified in some incremental way.
  • The next time you visit another site with content from the same 3rd party, they can probably identify you again.

That referer URL is a significant key that ties a lot of browsing habits together.

There’s an important distinction to be made here. The referer header makes it possible for 3rd party sites to track your content, and it’s only one of many ways. Doing away with the referer header won’t prevent the sites running 3rd party tracking content from doing so. The owner of the site can always send the URL you’re looking at to the 3rd party as part of the request, even if your browser isn’t. However, what this does prevent is tracking without the consent of the owner of the site you’re looking at. Of all of the sites you’re looking at, actually. Judging from my admittedly limited conversations with site owners, there are a LOT of people out there who have no idea that their users can be tracked if they include 3rd party ads on their site, or flickr images, or whatever. (Again, not to say that their users are being tracked, but the possibility is there.)

Again, the site that includes the ad or image or whatever isn’t sending that information – your browser is, and this is a legacy of the early days of the web. Some browsers allow you to turn it off and not send any referer information. I’d argue that this should be off by default, because there disadvantages outweigh the benefits. I’m told that legitimate advertisers don’t rely on the referer header anyway, because it can be unreliable. If that’s true, that’s even less reason to keep it around.

Suggestion number one was “Tracking information that’s linked to personally identifiable information should also be considered personally identifiable“.

Perhaps suggestion two is “Let’s do away with the Referer header”. (Of course, this comes on the heels of a Google-employed Firefox developer adding more tracking features instead of taking them away.)

Arguments for or against? Are there any good uses for this that are worth the potential for abuse?

Tags: , , , , , ,


Powered by WordPress