Adam Fields (weblog)

This blog is largely deprecated, but is being preserved here for historical interest. Check out my index page at for more up to date info. My main trade is technology strategy, process/project management, and performance optimization consulting, with a focus on enterprise and open source CMS and related technologies. More information. I write periodic long pieces here, shorter stuff goes on twitter or


Why all this mucking about with irrevocable licenses?

The Google+ Terms of Service include various provisions to give them license to display your content, and this has freaked out a bunch of professional photographers:

‘By submitting, posting or displaying the content you give Google a perpetual, irrevocable, worldwide, royalty-free, and non-exclusive license to reproduce, adapt, modify, translate, publish, publicly perform, publicly display and distribute any Content which you submit, post or display on or through, the Services.’

I don’t even understand why this is necessary. Why can’t this just be ‘you give us a license to display your content on the service until you delete it’?


Martin’s Letter to his Mother-in-Law

Filed under: — adam @ 3:23 pm

This was posted to a politics discussion list I’m on. I’ve been meaning to write something similar, but he beat me to it. I thought it was good, so here it is reprinted with permission:


Over the years we have know one another, I’ve steadfastly avoided
discussions of politics and religion, and I think that there has been a good
attitude and an agreement to disagree on several key issues of politics and

For the first time in all these years, I’m going to break with that, as
unlike in any time in my life, I think that I have an obligation to reach
out to people close to me to make a case for Barack Obama.

I ask you to step back from some of the specific issues where the
Republicans have long held their base in steadfast opposition to the
Democrats – issues that are more often than not are decided on a
state-by-state level. Divisive issues, like abortion, like gun ownership,
like tax policy – these issues are the classic “party line” issues that have
long divided this nation.

I ask that you look at the reality of what the Republican party has become,
not what they say, look at what they do. They are nothing at all like the
small government, fiscal conservatives they claim to be.

You know this – you must know it, you must see it. The deficit that we have
now – without the bailout of Wall Street, without the cost of the Iraq war,
without the tax breaks for Exxon, amounts to me and my children working
several months of every year for our entire lives just to pay the debt down.
Add in bailing out Wall Street and all of the other corporate tax breaks and
it gets worse.

So I ask you to look at John McCain as a man of the Republican party, in the
context of your own personal situation, as well as the state of the nation.

Following the economic policies of the current
Republican/Neoconservativeparty (and they are nothing at all like
actual Conservatives), we have seen:

- The ascendancy of China through trade policies that sacrificed American
workers for cheap Chinese labor producing deadly toxic goods.

- Wasting the lives of our military men and woman chasing phantoms in Iraq
while Bin Laden walks free in Afghanistan, where we should be.

- Supporting tax breaks for Exxon – a company with the largest profits of
any company in history – while denying tax credits for alternative energy

- Allowing United Airlines and other companies to gut the provisions of
their pension plans, while letting their CEO’s walk away with millions.

- The highest profits ever recorded by the health insurance companies and
pharma industry and the largest number of uninsured Americans unable to get
basic preventative health care.

- The systematic looting of the financial system, leaving you, me, our kids,
grand kids and great grandkids with the largest deficit ever recorded
outside of wartime (and soon to surpass that)

- The overall REDUCTION in take-home wages in the middle class

- The largest INCREASE in income for the top 1% of Americans ever known

- The most authoritarian government, with the most egregious violations of
the constitution ever seen, eradicating one basic right after the other.

- And, as of now, the attempt at the establishment of a
government-controlled financial system that absolutely DWARFS anything you
could ever find in Europe and is nothing less than a gift to the bosses of
Wall Street for their unspeakable economic crimes against America.

There’s more – much more – that I could go into, and while the Democrats are
hardly free of all blame for this, their biggest crime for the last 8 years
was not working hard enough to get elected.

The Neoconservative movement that hijacked the Republican Party leaves
nothing like the party of Lincoln or even of Richard Nixon, who by
comparison was a rank amateur to Karl Rove and company.

I know that Obama is a flawed candidate, and I do have some concerns about
his “experience” in some areas. For example, I’m a strong supporter of
individual gun ownership, I don’t think that public education works at all
anymore (but it can be fixed, maybe), and I grit my teeth when I see that
Obama voted for FISA, which essentially killed the fourth Amendment.

I have the recent Supreme Court decision of US v. Heller on the second
amendment to back me up on the guns, and I have hope that FISA and its
cohorts will be fixed one day. I think they can be.

There are other minor issues where I don’t think Obama is right, and on some
I’m in basic opposition. But there’s enough on the big picture items – the
economic policies, the technology, the plain old fashioned politics of
compromise to move everyone forward a little bit rather than moving some
ahead at the expense of others – an country where we can at least get to
something less punitive on the working person.

I’ve seen the results of the McCain “experience” of the last 8 years (plus
his many years in Washington DC), and if you can honestly say that you feel
better about the future of America today – after 8 years of the Bush
doctrine – which McCain supported 100%, never once voting against a proposed
Bush item – then do what you must.

McCain is no stranger to financial mis-deeds, with his previous involvement
in the Savings and Loan collapse (he was a member of the infamous “Keating
5″) and his wife’s business involvements with the Arabs are many and deep.
He’s not a “man of the people” at all, he’s a company man, through and

If you think CEO’s are over-paid, if you think that the richest people
should pay their fair share of taxes, if you think that the government
should be by, for and of The People, not the corporations, please, I beg
you, vote for Obama.

If you want to see your Grandchildren grow up in a world where America is
actually a kinder, gentler nation, where we take responsibility first for
our people and planet, not for the few oligarchs at the head of the Fortune
500 – where you and I will pay less taxes than a big company with all kinds
of tax breaks – where my children will be free to ask questions of their
leadership, to gather in protest to get redress for their grievances without
fear of being tasered or beaten for the “crime” of peaceful protest, where
their Vice President is competent to step into the job of President at a
moment’s notice, I beg you, vote for Obama.

Although you know I am not a religious man, I ask you to consider your
faith, and to ask yourself, would Jesus condone a man who, like John Mc
Cain, abandoned his wife after she was crippled in a car wreck and married
another woman one month later? Is it Christian to condone the torture of
another human being?

Help your fellow man. That’s a basic rule, isn’t it?

Consider his position on health care, from

“John McCain Believes The Key To Health Care Reform Is To Restore Control To
The Patients Themselves. We want a system of health care in which everyone
can afford and acquire the treatment and preventative care they need. Health
care should be available to all and not limited by where you work or how
much you make. Families should be in charge of their health care dollars and
have more control over care.”

Sorry, but that’s absolute BULLSHIT. A system where “Families should be in
charge of their health care dollars and have more control over care” still
places the financial burden of health care on FAMILIES and profits in the
pockets of insurance companies. Read the words the man has posted. His
proposal has nothing – nothing at all – that will change anything in health
care. You saw the medical bills from the car accident with your own children
20 years ago – today, that would be a million-dollar accident, and what
average FAMILY is in control of $1,000,000 they can spend on healthcare?

OH WAIT – A family where they are not sure how many houses they own, but
they might have written it down somewhere in the family jet (which costs
$6,000 an HOUR to operate, by the way). McCain’s Family isn’t our family,
it’s not your family, his financial reference points have nothing to do with
yours or mine and simply can’t.

How about High tech – an area where I’m certainly interested, but I’m more
interested on behalf of my kids. We’re 25th in the world in terms of
high-speed internet. And falling behind places like Boliva and Argentina and
even Latvia. LATVIA!!! We have poorer internet and mobile phone
infrastructure than LATVIA!

But McCain’s site says:

“John McCain is uniquely qualified to lead our nation during this
technological revolution. He is the former chairman of the Senate Committee
on Commerce, Science and Transportation. The Committee plays a major role in
the development of technology policy, specifically any legislation affecting
communications services, the Internet, cable television and other
technologies. Under John McCain’s guiding hand, Congress developed a
wireless spectrum policy that spurred the rapid rise of mobile phones and
Wi-Fi technology that enables Americans to surf the web while sitting at a
coffee shop, airport lounge, or public park.”
Let’s see, under John McCain, he was on the committee that has been
responsible for on “legislation affecting communications services, the
Internet, cable television and other technologies” .

Uh-huh. And in the intervening years, we’ve seen the USA open up the
commercial Internet, and then fall behind. We have one of the WORST mobile
phone infrastructures in the world. In japan, you can do 2-way video
conferencing from your mobile phone. Home internet connections are 10 times
faster than the fastest connection in the USA and cost $29 a month). So,
what’s the result of a technology policy from a guy who has never used
Google? – oh wait – he called it The Google. Give me a break. You’re more
technologically advanced than McCain – far more.

OK, I’ll stop with the policy for policy comparison and leave you with this.

Do you remember JFK? Do you remember when in the USA it seemed that there
was potential – where there was more we could do as a nation than just what
I want or what you personally want? I can’t. You know why? Because the very
first memory I have of politics was the Watergate scandal. I watched Nixon
resign on live TV August 8th 1974. I was exactly the same age Martin is now,
and all I remember was that the President was a crook who lied and left the
office in shame. His crony Gerry ford let him walk free. That’s the first
political memory I have.

Today, Martin asks why I support Obama, and I tell him that it’s because
Obama isn’t what we’ve had in office for almost his whole life – a failed
administration, with policies that have curtailed my economic opportunities,
that have cost me my ability to save for his future, as my salary has not
increased (in inflation adjusted dollars), since the day he was born (in
fact, accounting for inflation I make about 15% LESS than the day Martin was
born in 1999), while in the same period, billionaires have been created with
the money made from the productivity gains and lowered salaires of American
workers. Thanks to our “free market health care system” This year, I spent
more on healthcare than any other expense including my mortgage. Since 1999
I’ve seen us go to war against a nation that never attacked us, I’ve seen
the steady erosion of my civil liberties in the name of “freedom” and I’ve
seen no end to the rise of corporate power while my own rights have been

I don’t want my kids to grow up with any more years of that, and McCain -
who has always voted in favor of Bush policies, without exception, is
another of the same of the Neoconservative cloth. He’s no Maverick, he’s no
leader, he’s a member of the power elite and he is absolutely corrupt to the
last creaky bone of his decaying body.

I won’t even bother with the issues that come from picking the basic
equivalent of a Township supervisor as a running mate, other than to say
that “likeable” is a great quality, but it does not make up for a total lack
of competence.

My kids have their whole lives to lead, I believe in my heart that with
Obama in office they will have a better life, and a better future than with
McCain. Please consider this on Election day.


The Google Chrome terms of service are hilarious

I’ve been very busy lately, but this is just too much to not comment on.

There are other articles about how the Google Chrome terms of service give Google an irrevocable license to use any content you submit through “The Services” (a nice catchall term which includes all Google products and services), but the analysis really hasn’t gone far enough – that article glosses over the fact that this applies not only to content you submit, but also content you display. Of course, since this is a WEB BROWSER we’re talking about, that means every page you view with it.

In short, when you view a web page with Chrome, you affirm to Google that you have the right to grant Google an irrevocable license to use it to “display, distribute and promote the Services”, including making such content available to others. If you don’t have that legal authority over every web page you’ve visited, you’ve just fraudulently granted that license to Google and may yourself be liable to the actual copyright owner. (If you do, of course, you’ve just granted them that license for real.) I’m not a lawyer, but I suspect that Google has either committed mass inducement to fraud or the entire EULA (which lacks a severability clause) is impossible to obey and therefore void. [Update: there is a severability clause in the general terms, which I missed on the first reading. Does that mean that the entire content provisions would be removed, or just the parts that apply to the license you grant Google over the content you don't have copyright to? I don't know.]

Even more so than usual, these terms are, quite frankly, ridiculous and completely inappropriate for not only a web browser but an open source web browser.

Nice going guys.


New social networking features on Confabb launched today

Filed under: — adam @ 2:41 pm

I’m extremely proud of the Confabb site, and it’s nice to see it evolving past the limited feature set we were able to squeeze in before launch. There’s a LOT more great stuff coming. The development team has been working very hard for the past few months, and a bunch of new social networking features went live today.

From the press release:

New Logged-in Homepage

Log in and check out ‘your new homepage.’ Above ‘your conferences’ is the new ‘your network,’ a bird’s-eye view of bulletin board messages from within your network (more on that below), your online Confabb connections and any messages sent to you by those within the Confabb community. Click on ‘My Account’ to see the full range of search and connection possibilities. Post your own messages for everyone to see on “your bulletin board,” which will be broadcast globally—Confabb pings no fewer than 68 of the major alerting services—or have a one-on-one discussion with other Confabb members. You can also see what others are talking about and invite new people, either from within or outside of Confabb, to join your network.

New Search!

There are two new forms of search on the site (you’ll all remember that the search function was Confabb’s Achilles Heel when we launched). There is now an advanced search for conferences which drills down into multiple parameters such as location, keyword, location, category and when the show date starts and stops. That nullifies one of the biggest knocks we got at launch. People will love it. We’ve also added a “User Search” which lets Confabb users search for and connect with other Confabb community members. Of course that sets us up for connecting people within the community and that’s the best part.

MY Connections (or “buddy lists”)

Just as you keep a list of people with home you correspond daily, the “My Connections” tab is your gateway to the personal contacts you’ve made within the Confabb community–people with whom you’ve connected before and want to stay in touch with going forward. This is your personal network; friends, colleagues and other contacts whose whereabouts and doings you want to follow as they prepare for and an attend events. Attendees can view a list of other conference participants, check out their profiles, invite them into their personal network and email them directly through Confabb’s personal messaging feature.

Personal Messaging

This is the Confabb community’s personal email service. We respect everyone’s right to privacy so messaging within the community is handled by us; simply use the “contact” link to jot a note to the person of your choice and we’ll send the message to the email that person has registered within our system. Responses are handled by us as well so your information is never revealed unless you choose to do that outside of the community.


This is cool. “Media” is just that: everything that interests you from across the web, from text-based articles and links to photos, RSS feeds for breaking information and even full blown videos. The content comes from the web’s leading sources of open information, including Technorati, Google and Yahoo!, Feedster, Flickr and YouTube. Simply click the “Media” tab at the top of the navigation bar and find information on just about anything by searching for the subject’s name or the subject’s tag in the desired content source. The Media tab lets you experience the conference through everyone else’s eyes, and they experience it through content you create, find and share with them.

Bulletin Boards

Confabb now provides all of its users with their own personal blogs, or bulletin boards, from which they can share their thoughts, opinions on the issues and experiences. This is the community member’s space; it’s intensely individual, consisting of the member’s content and comments from their readers. People can also read the musing of others within their network by clicking on the “Bulletin Board Posts within My Network” tab, which shows what others within their network are saying too.

Each board–the individual blog and the personal network bulletin board–are completely searchable by the major search engines. You will build traffic from within the community as well as anyone from around the globe with an interest in what you have to say!

Tags: , ,


Google has just bought a lot of browsing history of the internet

I pointed out that YouTube was a particularly valuable acquisition to Google because their videos are the most embedded in other pages of any of the online video services. When you embed your own content in someone else’s web page, you get the ability to track who visits that page and when, to the extent that you can identify them. This is how Google Analytics works – there’s a small piece of javascript loaded into the page which is served from one of Google’s servers, and then everytime someone hits that page, they get the IP address, the URL of the referring page, and whatever cookies are stored with the browser for the domain. As I’ve discussed before, this is often more than enough information to uniquely identify a person with pretty high accuracy.

DoubleClick has been doing this for a lot longer than Google has, and they have a lot of history there. In addition to their ad network, Google has also just acquired that entire browsing history, profiles of the browsing of a huge chunk of the web. Google’s privacy policy does not seem to apply to information acquired from sources other than, so they’re probably free to do whatever they want with this profile data.

[Update: In perusing their privacy policy, I noted this: If Google becomes involved in a merger, acquisition, or any form of sale of some or all of its assets, we will provide notice before personal information is transferred and becomes subject to a different privacy policy. This doesn't specify which end of the merger they're on, so maybe this does cover personal information they acquire. I wonder if they're planning on informing everyone included in the DoubleClick database.]

Tags: , , ,


Google to purge some data after 18-24 months

Filed under: — adam @ 6:33 pm

Well, that’s a nice start. Good for them.

Tags: , ,


Google has your logs (and all it took was a fart lighting video)

The non-obvious side of Google’s purchase of YouTube: Google now has access to the hit logs of every page that a YouTube video appears on, including LOTS of pages that were probably previously inaccessible to them. MySpace pages were probably going to get Google ads anyway, because of the big deal that happened there, but many others weren’t.

Add this to AdSense, the Google Web Accelerator, Google Web Analytics, and Google Maps, and that’s a lot of data being collected about browsing habits, and the number of sites you can browse without sending some data to Google has just dropped significantly.


Tags: , , ,


Google Government search

I think it’s simultaneously good that Google is turning a watchful eye on the government, but also somewhat creepy that they’re putting themselves in the position of proxying people’s access to potentially sensitive information. I do NOT think that the Google privacy policy is sufficient to cover this situation.

As many have predicted, this is also likely to expose some interesting accidentally unprotected things at some point in the future.

Tags: , , ,


Better presentation of search results

Filed under: — adam @ 12:40 pm

I just happened to notice that Clusty, which I’ve been using for searching for the past few months (their privacy policy is better than the others, although not perfect, and the results are mostly indistinguishable from Google’s or Yahoo’s), has some neat little buttons next to each result that are totally unobtrusive, to the point that I only even realized they were there today, but also extremely useful.

Two of them are kind of standard (open in a new window, and view the cluster for the search result), but the other one is so mindbogglingly obvious that I’m ashamed that they don’t all do this.

It’s preview. Click it and the link opens up in a small frame underneath the result without leaving the page. Even PDFs.


Tags: , , , , , , ,


US Mandatory Data Retention laws are coming

Filed under: — adam @ 9:35 am

Remember the privacy implications of the government asking Google for search data? (

It’s going to get worse before it gets better. No online service considers your IP address to be private information, and now they will be required to maintain logs mapping your IP address to real contact information, for a period of at least one year after your account is closed.

The only way to prevent this information from being misused is to not keep it, and now there won’t be any choice.

I’ve discussed this before:

Tags: , ,


Hidden dangers for consumers – Trojan Technologies

I’ve been collecting examples of cases where there are hidden dangers facing consumers, cases where the information necessary to make an informed decision about a product isn’t obvious, or isn’t included in most of the dialogue about that product. Sometimes, this deals with hidden implications under the law, but sometimes it’s about non-obvious capabilities of technology.

We’re increasingly entering situations where most customers simply can’t decide whether a certain product makes sense without lots of background knowledge about copyright law, evidence law, network effects, and so on. Things are complicated.

So far, I have come up with these examples, which would seem to be unrelated, but there’s a common thread – they’re all bad for the end user in non-obvious ways. They all seem safe on the surface, and often, importantly, they seem just like other approaches that are actually better, but they’re carrying hidden payloads – call them “Trojan technologies”.

To put it clearly, what I’m talking about are the cases where there are two different approaches to a technology, where the two are functionally equivalent and indistinguishable to the end user, but with vastly different implications for the various kinds of backend users or uses. Sometimes, the differences may not be evident until much later. In many circumstances, the differences may not ever materialize. But that doesn’t mean that they aren’t there.

  • Remote data storage. I wrote a previous post about this, and Kevin Bankston of the EFF has some great comments on it. Essentially, the problem is this. To the end user, it doesn’t matter where you store your files, and the value proposition looks like a tradeoff between having remote access to your own files or not being able to get at them easily because they’re on your desktop. But to a lawyer asking for those files, it makes a gigantic difference in whether they’re under your direct control or not. On your home computer, a search warrant would be required to obtain them, but on a remote server, only a subpoena is needed.
  • The recent debit card exploit has shed some light on the obvious vulnerabilities in that system, and it’s basically the same case. To a consumer, using a debit card looks exactly the same as using a credit card. But the legal ramifications are very different, and their use is protected by different sets of laws. Credit card liability is typically geared in favor of the consumer – if your card is subject to fraud, there’s a maximum amount you’ll end up being liable for, and your account will be credited immediately, as you simply don’t owe the money you didn’t charge yourself. Using a debit card, the money is deducted from your account immediately, and you have to wait for the investigation to be completed before you get your refund. A lot of people recently discovered this the hard way. There’s a tremendous amount of good coverage of debit card fraud on the Consumerist blog.
  • The Goodmail system, being adopted by Yahoo and AOL, is a bit more innocuous on the surface, but it ties into the same question. On the face of it, it seems like not a terrible idea – charge senders for guaranteed delivery of email. But the very idea carries with it, outside of the normal dialogue, the implications of breaking network neutrality (the concept that all traffic gets equal treatment on the public internet) that extend into a huge debate being raged in the confines of the networking community and the government, over such things as VoIP systems, Google traffic, and all kinds of other issues. I’m not sure if this really qualifies in the same league as my other examples, but I wanted to mention it here anyway. There’s a goodmail/network neutrality overview discussion going on over on Brad Templeton’s blog.
  • DRM is sort of the most obvious. Consumers can’t tell what the hidden implications of DRM are. This is partly because those limitations are subject to change, and that in itself is a big part of the problem. The litany of complaints is long – DRM systems destroy fair use, they’re security risks, they make things complicated for the user. I’ve written a lot about DRM in the past year and a half.
  • 911 service on VoIP is my last big example, and one of the first ones that got me started down this path. This previous post, dealing with the differences between multiple kinds of services called “911 service” on different networks, is actually a good introduction to this whole problem. I ask again ‘Does my grandmother really understand the distinction between a full-service 911 center and a “Public Safety Answering Point”? Should she have to, in order to get a phone where people will come when she dials 911?

I don’t have a good solution to this, beyond more education. This facet must be part of the consumer debate over new technologies and services. These differences are important. We need to start being aware, and asking the right questions. Not “what are we getting out of this new technology?“, but “what are we giving up?“.

Tags: , , , , , , , , , ,


Google forced to release records by the court

As predicted, U.S. Judge James Ware intends to force Google to hand over the requested data to the DoJ.

Tags: , , , ,


Taking advantage of the Commons

Filed under: — adam @ 10:21 am

I received this email in my flickr inbox this morning:

“I am writing to let you know that one of your photos with a creative commons license has been short-listed for inclusion in our Schmap Rome Guide, to be published late March 2006.”

And a link where I was given an opportunity to remove my photo from the queue or approve it for use in their guide. I responded to this before I had my coffee, so I didn’t capture the text from the page as I should have before clicking no. But it had a short blurb of text with something along the lines of “oh, even though some people may disagree, this isn’t really a commercial use, because it’s free to download and the ads support keeping it free”.

I might buy that if there was any sort of community sharing going on here. I don’t see the content of the site being released under a CC license, I see a big fat “All rights reserved” at the bottom of the homepage, and the terms of use (which also, incidentally, says you’re not allowed to use ad blocking software) contains this choice little gem:

The geographic data, photographs, diagrams, maps, points of interest, plans, aerial imagery, text, information, artwork, graphics, points of interest, video, audio, listings, pictures and other content contained on the Site (collectively, the “Materials”) are protected by copyright laws. You may only access and use the Materials for personal or educational purposes and not for resell or commercial purposes by You or any third parties. You may not modify or use the Materials for any other purpose without express written consent of Schmap (”Schmap”). You may not broadcast, reproduce, republish, post, transmit or distribute any Materials on the Site.

This is a gross perversion of what Creative Commons is about. Ad-supported “free” content is commercial (unless Google is “just trying to organize the world’s information and any money collected from selling ads is just helping keep that goal alive”). Taking CC-licensed media from other sources and roadblocking the license while claiming that the use is non-commercial is possibly deceptive.

[Update: there's more discussion on this Flickr Central thread.]

Tags: , ,


Fun stuff you can find with Google

Filed under: — adam @ 6:01 pm

Nothing really new, but some interesting examples.

The world’s information doesn’t want to be organized.

Tags: ,


Storing your files on Google’s server is not a good idea

Filed under: — adam @ 2:29 pm

I was going to write something long about this, but Kevin Bankston of the EFF has beaten me to it and put together pretty much everything I was going to say.

Here’s the original piece:

In response to a criticism on the IP list that this piece was too hard on Google, Kevin wrote the following, which I reproduce here verbatim with permission. I think that this does an excellent job of summing up how I feel about these privacy issues. I have nothing personally against Google, or any of the other companies that I often “pick on” in pointing out potential flaws. I do think that somewhere along the way in getting to where we are now, we have lost some important things in the areas of corporate responsibility and consumer protections, and technology has advanced to the point where it’s not even obvious what has been lost. The tough thing is that there are often tradeoffs with useful functionality, and it’s not clear what you’re giving up in order to make use of that potential new feature.

So, in this case – yeah, it’s great that you can search your files from more than one computer, but Google hasn’t warned you that your doing so by their method, under the current law, exposes your private data to less rigorous protections from search by various parties than it would be if it were left on your own computer. To most people, it doesn’t make any difference where their files are stored. To a lawyer with a subpoena in hand, it does. These are important distinctions, and they’re not being made to the general public. I believe it is the responsibility of those who understand these risks to bring this dialogue to those who don’t. It’s a a big part of why I write this blog.

Kevin’s response:

Thanks for your feedback. I’m sorry if you found our press release inappropriately hostile to Google, although I would say it was appropriately hostile–not to Google or its folk, but to the use of this product, which we do think poses a serious privacy risk.

Certainly, the ability to search across computers is a helpful thing, but considering that we are advocating against the use this particular product for that purpose, I’m not sure why we would include such a (fairly obvious) proposition in the release. And as to tone, well, again, the goal was to warn people off of this product, and you’re not going to do that by using weak language. Certainly, we’re not out to personally or unfairly attack the people at Google. Indeed, we work with them on a variety of non-privacy issues (and sometimes privacy issues, too). But it’s our job to forcefully point out when they are marketing a product that we think is a dangerous to consumers’ privacy, and dropping in little caveats about how clever Google’s engineers are or how useful their products can be is unnecessary and counterproductive to that purpose.

I think it’s clear from the PR that our biggest problem here is with the law. But we are also very unhappy with companies–including but not limited to Google–that design and encourage consumers to use products that, in combination with the current state of the law, are bad for user privacy. Google could have developed a Search Across Computers product that addressed these problems, either by not storing the data on Google servers there are and long have been similar remote access tools that do not rely on third party storage), or by storing the data in encrypted form such that only the user could retrieve it (it is encrypted on Google’s servers now, but Google has the key).

However, both of those design options would be inconsistent with one of Google’s most common goals: amassing user data as grist for the ad-targeting mill (otherwise known, by Google, as “delivering the best possible service to you”). As mentioned in the PR, Google says it is not scanning the files for that purpose yet, but has not ruled it out, and the current privacy policy on its face would seem to allow it. And although I for one have no problem with consensual ad-scanning per se, which technically is not much different than spam-filtering in its invasiveness, I do have a very big problem with a product that by design makes ad-scanning possible at the cost of user privacy. This is the same reason EFF objected to Gmail: not because of the ad-scanning itself, but the fact that Google was encouraging users, in its press and by the design of the product, to never delete their emails even though the legal protection for those stored communications are significantly reduced with time.

If Google wants to “not be evil” and continue to market products like this, which rely on or encourage storing masses of personal data with Google, it has a responsibility as an industry leader to publicly mobilize resources toward reforming the law and actively educating its users about the legal risks. Until the law is fixed, Google can and should be doing its best to design around the legal pitfalls, placing a premium on user privacy rather than on Google’s own access to user’s data. Unfortunately, rather than treating user privacy as a design priority and a lobbying goal, Google mostly seems to consider it a public relations issue. That being the case, it’s EFF’s job to counter their publicity, by forcefully warning the public of the risks and demanding that Google act as a responsible corporate citizen.

Once again, another reason why you should be donating money to the EFF. Do it now.

Tags: , , , ,


Detailed survey of verbatim answers from AOL, MS, Yahoo, and Google about what details they store

Declan McCullagh has compiled responses from AOL, Microsoft, Yahoo and Google on the following questions (two of which are nearly verbatim from my previous query, uncredited):

So we’ve been working on a survey of search engines, and what data they keep and don’t keep. We asked Google, MSN, AOL, and Yahoo the same questions:

- What information do you record about searches? Do you store IP addresses linked to search terms and types of searches (image vs. Web)?
- Given a list of search terms, can you produce a list of people who searched for that term, identified by IP address and/or cookie value?
- Have you ever been asked by an attorney in a civil suit to produce such a list of people? A prosecutor in a criminal case?
- Given an IP address or cookie value, can you produce a list of the terms searched by the user of that IP address or cookie value?
- Have you ever been asked by an attorney in a civil suit to produce such a list of search terms? A prosecutor in a criminal case?
- Do you ever purge these data, or set an expiration date of for instance 2 years or 5 years?
- Do you ever anticipate offering search engine users a way to delete that data?

Tags: , , ,


More specific Google tracking questions

I asked two very specific questions in a conversation with John Battelle, and he’s received unequivocal answers from Google:

1) “Given a list of search terms, can Google produce a list of people who searched for that term, identified by IP address and/or Google cookie value?”

2) “Given an IP address or Google cookie value, can Google produce a list of the terms searched by the user of that IP address or cookie value?”

The answer to both of them is “yes”.

Tags: , , ,

Flickr pictures, web beacons, and a modest proposal

As I noted in the comments of the previous post, I don’t have ads on the site, but I do have flickr pictures directly linked from my flickr account.

It is conceivable to me that flickr pictures could qualify as “web beacons” under the Yahoo privacy policy, and thus be used for tracking purposes. Presumably, this was not the original intention of the flickr developers, but it’s certainly a possibility now that they’re owned by Yahoo. Are the access logs for the static flickr pictures available to Yahoo? Probably. Are they correlated with other sorts of usage information? It’s not clear. Presumably, flickr pictures are linked in places where standard Yahoo web beacons can’t go, because they’re not invited (like on this site, for example).

I think my conclusion is that this is probably not a problem, but maybe it is. It and other sorts of distributed 3rd party tracking all have one thing in common:

It’s called HTTP_REFERER.

Here’s how it works. When you make a request for any old random web page that contains a 3rd party ad or an image or a javascript library or whatever, your browser fetches the embedded piece of content from the 3rd party. When it does that, as part of the request, it sends the URL of the page you visited as part of the request, in a field called the referer header (yes, it’s misspelled).

So, every time you visit a web page:

  • You send the URL to the owner of the page. So far so good.
  • You send your IP address to the owner of the page. Not terrible in itself.
  • You send the URL of the page you visited to the owner of the 3rd party content. And this is where it starts to degrade a little.
  • You send your IP address to the owner of the 3rd party content. The owner of the 3rd party content may be able to set a cookie identifying you. Modern browsers are set by default to refuse 3rd party cookies. However, if that 3rd party has ever set a cookie on your browser before (say, if you hit their site directly), they can still read it. In any case, you can be identified in some incremental way.
  • The next time you visit another site with content from the same 3rd party, they can probably identify you again.

That referer URL is a significant key that ties a lot of browsing habits together.

There’s an important distinction to be made here. The referer header makes it possible for 3rd party sites to track your content, and it’s only one of many ways. Doing away with the referer header won’t prevent the sites running 3rd party tracking content from doing so. The owner of the site can always send the URL you’re looking at to the 3rd party as part of the request, even if your browser isn’t. However, what this does prevent is tracking without the consent of the owner of the site you’re looking at. Of all of the sites you’re looking at, actually. Judging from my admittedly limited conversations with site owners, there are a LOT of people out there who have no idea that their users can be tracked if they include 3rd party ads on their site, or flickr images, or whatever. (Again, not to say that their users are being tracked, but the possibility is there.)

Again, the site that includes the ad or image or whatever isn’t sending that information – your browser is, and this is a legacy of the early days of the web. Some browsers allow you to turn it off and not send any referer information. I’d argue that this should be off by default, because there disadvantages outweigh the benefits. I’m told that legitimate advertisers don’t rely on the referer header anyway, because it can be unreliable. If that’s true, that’s even less reason to keep it around.

Suggestion number one was “Tracking information that’s linked to personally identifiable information should also be considered personally identifiable“.

Perhaps suggestion two is “Let’s do away with the Referer header”. (Of course, this comes on the heels of a Google-employed Firefox developer adding more tracking features instead of taking them away.)

Arguments for or against? Are there any good uses for this that are worth the potential for abuse?

Tags: , , , , , ,


What’s the big fuss about IP addresses?

Given the recent fuss about the government asking for search terms and what qualifies as personally identifiable information, I want to explain why IP address logging is a big deal. This explanation is somewhat simplified to make the cases easier to understand without going into complete detail of all of the possible configurations, of which there are many. I think I’ve kept the important stuff without dwelling on the boundary cases, and be aware that your setup may differ somewhat. If you feel I’ve glossed over something important, please leave a comment.

First, a brief discussion of what IP addresses are and how they work. Slightly simplified, every device that is connected to the Internet has a unique number that identifies it, and this number is called an IP address. Whenever you send any normal network traffic to any other computer on the network (request a web page, send an email, etc…), it is marked with your IP address.

There are three standard cases to worry about:

  1. If you use dialup, your analog modem has an IP address. Remote computers see this IP address. (This case also applies if you’re using a data aircard, or using your cell phone as a modem.)
  2. If you have a DSL or cable connection, your DSL/cable modem has an IP address when it’s connected, and your computer has a separate internal IP address that it uses to only communicate with the DSL or cable modem, typically mediated by a home router. Remote computers see the IP address of the DSL/cable modem. (This case also applies if you’re using a mobile wifi hotspot.)
  3. If you’re directly connected to the internet via a network adapter, your network adapter has an IP address. Remote computers see this IP address.

Sometimes, IP addresses are static, meaning they’re manually assigned and don’t change automatically unless someone changes them (typically, only for case #3). Often, they’re dynamic, which means they’re assigned automatically with a protocol called DHCP, which allows a new network connection to automatically pick up an IP address from an available pool. But just because they can change doesn’t mean they will change. Even dynamic IP addresses can remain the same for months or years at a time. (The servers you’re communicating with also have IP addresses, and they are typically static.)

In order to see how an IP address may be personally identifiable information, there’s a critical question to ask – “where do IP addresses come from, and what information can they be correlated with?”.

Depending on how you connect to the internet, your IP address may come from different places:

  • If you use dialup, your modem will get its IP address from the dialup ISP, with which you have an account. The ISP knows who you are and can correlate the IP address they give you with your account. Your name and billing details are part of your account information. By recording the phone number you call from, they may be able to identify your physical location.
  • If you have a DSL or cable connection, your DSL/cable modem will get its IP address from the DSL/cable provider. The ISP knows who you are and can correlate the IP address they give you with your account. Your name and physical location, and probably other information about you, are part of your account information.
  • If you’re using a public wifi access point, you’re probably using the IP address of the access point itself. If you had to log in your account, your name and physical location, and probably other information about you, are part of your account information. If you’re using someone else’s open wifi point, you look like them to the rest of the internet. This case is an exception to the rest of the points outlined in this article.
  • If you’re directly connected to the internet via a network adapter, your network adapter will get its IP address from the network provider. In an office, this is typically the network administrator of the company. Your network administrator knows which computer has which IP address.

None of this information is secret in the traditional sense. It is probably confidential business information, but in all cases, someone knows it, and the only thing keeping it from being further revealed is the willingness or lack thereof of the company or person who knows it.

While an IP address may not be enough to identify you personally, there are strong correlations of various degrees, and in most cases, those correlations are only one step away. By itself, an IP address is just a number. But it’s trivial to find out who is responsible for that address, and thus who to ask if you want to know who it’s been given out to. In some cases, the logs will be kept indefinitely, or destroyed on a regular basis – it’s entirely up to each individual organization.

Up until now, I’ve only discussed the implications of having an IP address. The situation gets much much worse when you start using it. Because every bit of network traffic you use is marked with your IP address, it can be used to link all of those disparate transactions together.

Despite these possible correlations, not one of the major search engines considers your IP address to be personally identifiable information. [Update: someone asked where I got this conclusion. It's from my reading of the Google, Yahoo, and MSN Search privacy policies. In all cases, they discuss server logs separately from the collection of personal information (although MSN Search does have it under the heading of "Collection of Your Personal Information", it's clearly a separate topic). If you have some reason to believe I've made a mistake, I'm all ears.] While this may technically be true if you take an IP address by itself, it is a highly disingenuous position to take when logs exist that link IP addresses with computers, physical locations, and account information… and from there with people. Not always, but often. The inability to link your IP address with you depends always on the relative secrecy of these logs, what information is gathered before you get access to your IP address, and what other information you give out while using it.

Let’s bring one more piece into the puzzle. It’s the idea of a key. A key is a piece of data in common between two disparate data sources. Let’s say there’s one log which records which websites you visit, and it stores a log that only contains the URL of the website and your IP address. No personal information, right? But there’s another log somewhere that records your account information and the IP address that you happened to be using. Now, the IP address is a key into your account information, and bringing the two logs together allows the website list to be associated with your account information.

  • Have you ever searched for your name? Your IP address is now a key to your name in a log somewhere.
  • Have you ever ordered a product on the internet and had it shipped to you? Your IP address is now a key to your home address in a log somewhere.
  • Have you ever viewed a web page with an ad in it served from an ad network? Both the operator of the web site and the operator of the ad network have your IP address in a log somewhere, as a key to the sites you visited.

The list goes on, and it’s not limited to IP addresses. Any piece of unique data – IP addresses, cookie values, email addresses – can be used as a key.

Data mining is the act of taking a whole bunch of separate logs, or databases, and looking for the keys to tie information together into a comprehensive profile representing the correlations. To say that this information is definitely being mined, used for anything, stored, or even ever viewed is certainly alarmist, and I don’t want to imply that it is. But the possibility is there, and in many cases, these logs are being kept, if they’re not being used in that way now, the only thing really standing in the way is the inaction of those who have access to the pieces, or can get it.

If the information is recorded somewhere, it can be used. This is a big problem.

There are various ways to mask your IP address, but that’s not the whole scope of the problem, and it’s still very easy to leak personally identifiable information.

I’ll start with one suggestion for how to begin to address this problem:

Any key information associated with personally identifiable information must also be considered personally identifiable.

[Update: I've put up a followup post to this one with an additional suggestion.]

Tags: , , , , ,


Google does keep cookie- and IP-correlated logs

I asked John Battelle the question about whether Google keeps personally identifiable search log information, particularly search logs correlated with IP address. He asked Google PR, who confirmed that they do.

From my comment there, ultimately, this is bad for users. If the information is kept, it’s available for request, abuse, or theft.

Tags: , , , , , ,

Some evidence that Google does keep personally identifiable logs

This article from Internet Week has Alan Eustace, VP of Engineering for Google, on the record talking about the My Search feature.

“Anytime, you give up any information to anybody, you give up some privacy,” Eustace said.

With “My Search,” however, information stored internally with Google is no different than the search data gathered through its Google .com search engine, Eustace said.

“This product itself does not have a significant impact on the information that is available to legitimate law enforcement agencies doing their job,” Eustace said.

This seems pretty conclusive to me – signing up for saved searches doesn’t (or didn’t, in April 2005) change the way the search data is stored internally.


(This was pointed out to me by Ray Everett-Church in the comments of the previous post, covered on his blog:

Tags: , , , , , ,


Does Google keep logs of personal data?

The question is this – is there any evidence that Google is keeping logs of personally identifiable search history for users who have not logged in and for logged-in users who have not signed up for search history? What about personal data collected from Gmail, and Google Groups, and Google Desktop? Aggregated with search? Kept personally identifiably? (Note: For the purposes of this conversation, even though Google does not consider your IP address to be personally identifiable, at least according to their privacy policy, I do.)

It is not arguable that they could keep those logs, but I think every analysis I’ve seen is simply repeating the assumption that they do, based on the fact that they could.

Has there ever been a hard assertion, by someone who’s in a position to know, that these logs do in fact exist?

I have a suspicion about one possible source of all this. Google’s privacy policy used to say (amended 7/2004):

Google notes and saves [emphasis mine] information such as time of day, browser type, browser language, and IP address with each query.“.

But the policy no longer says that. The current version reads: “When you use Google services, our servers automatically record information that your browser sends whenever you visit a website. These server logs may include information such as your web request, Internet Protocol address, browser type, browser language, the date and time of your request and one or more cookies that may uniquely identify your browser.“. Again, no information about what’s being done with that data or how long it’s kept.

Given the possibility that they don’t, I think it drastically changes the value proposition of those free subsidiary tools. Obviously, if you ask for your search history to be saved, they’re going to keep it. But maybe that decision is predicated on the assumption that they’re going to keep it anyway, and you might as well have access to it. If the answer is that they’re not keeping it, that’s a different question.

It’s critical to point out that these issues are not even close to limited to Google. Every search engine, every “free” service you give your data to, every hub of aggregated data on the web has the same problems.

Currently, there’s no way to make an informed decision, because privacy policies don’t include specific information about what data is kept, in what form, and for how long. With all of the disclosures in the past year of personal data lost, compromised, and requested, isn’t it time for us to know? In the beginning of the web, having a privacy policy at all was unheard of, but now everybody has one. I don’t think it’s too much to ask of the companies we do business with that the same be done with log retention policies.

I agree with the request to ask Google to delete those logs if they’re keeping them, but I haven’t seen any evidence that they are. Personally, I’d like to know.

Tags: , , , , , ,


Tim Wu article on Google and search engine privacy

Filed under: — adam @ 11:03 am

This is pretty much exactly the point I’ve been trying to make – while Google is commendable for standing up to the government, they created this problem in the first place by aggregating search data.

“Imagine we were to find out one day that Starbucks had been recording everyone’s conversations for the purpose of figuring out whether cappuccino is more popular than macchiato. Sure, the result, on the margin, might be a better coffee product. And, yes, we all know, or should, that our conversations at Starbucks aren’t truly private. But we’d prefer a coffee shop that wasn’t listening – and especially one that won’t later be able to identify the macchiato lovers by name. We need to start to think about search engines the same way and demand the same freedoms.”


More thoughts on Google

Having examined the motion and letters, I see a different picture emerging.

I am not a lawyer, but from my reading of the motion, it appears that Google’s objections are thin. Really thin.
Also, they seem to have been completely addressed by the scaling back of the DOJ requests. Of course, that’s not the complete story, but if the arguments in the motion are correct, it seems like to me that Google will lose and be compelled to comply.

Based on the letters and other analysis, they’re also pulling the slippery slope defense – “we’re not going to comply with this because it will give you the expectation that we’re open for business and next time you can ask for personal information”. If that’s true, I think that’s the first good news I’ve heard out of them in years. Good luck with that.

Google’s own behavior is inconsistent with their privacy FAQ, which states Google does comply with valid legal process, such as search warrants, court orders, or subpoenas seeking personal information. These same processes apply to all law-abiding companies. As has always been the case, the primary protections you have against intrusions by the government are the laws that apply to where you live. (Interestingly, this language is inconsistent with their full privacy policy, which states that Google only shares personal information … [when] We have a good faith belief that access, use, preservation or disclosure of such information is reasonably necessary to (a) satisfy any applicable law, regulation, legal process or enforceable governmental request.

I wonder if they intend to challenge the validity of the fishing expedition itself, which would be the real kicker (and probably invalidate the above paragraph). I also idly wonder if they expect to lose anyway and have simply refused to comply with bogus arguments in order to get the request entered into the public record.

Interesting stuff. A lot of my criticisms of Google are about their unwillingness to publicly state their intentions with respect to the data they get (and the extent to which they may or may not be retaining, aggregating, and correlating that data), and I don’t think this case is any different. I think Google’s interest here in not releasing records is aligned with the public good, and as such, I wish them well. It’s been asserted that Google has taken extraordinary steps to preserve the anonymity of its records, and that well may be true. It’s also kind of irrelevant. Beyond this specific case, of whether the govnernment can request information about Google searches (let alone any of their more invasive services, or anyone’s more invasive services), is the issue of the ramifications of collecting, aggregating, and correlating this data in the first place.

There is no question that Google has access to a tremendous amount of data on everyone who interacts with its service. It is still troubling that its privacy policy is inadequate. It’s still troubling that Google (and Yahoo, and how many others) considers your IP address to be not personally identifiable information. It’s still troubling that Google (and Yahoo and how many others) do all of their transactions unencrypted and that search terms are included in the URL of the request. As this case has shown, Google’s actual behavior may not correlate to their stated intentions, of which there are few in the first place. By Google’s own slippery slope logic, this time it works for you – will it next time?

Perhaps it’s time to hold companies accountable for the records they keep.


Update on DOJ/Google

This is a fascinating deconstruction of the court documents and letters available so far:

DOJ demands large chunk of Google data

The Bush administration on Wednesday asked a federal judge to order Google to turn over a broad range of material from its closely guarded databases.

The move is part of a government effort to revive an Internet child protection law struck down two years ago by the U.S. Supreme Court. The law was meant to punish online pornography sites that make their content accessible to minors. The government contends it needs the Google data to determine how often pornography shows up in online searches.

In court papers filed in U.S. District Court in San Jose, Justice Department lawyers revealed that Google has refused to comply with a subpoena issued last year for the records, which include a request for 1 million random Web addresses and records of all Google searches from any one-week period.

I’m sort of out of analysis about why this is bad, because I’ve said it all before.

See (particularly 4 and 5):


It really comes down to one thing.

If data is collected, it will be used.

It’s far past the time for us all to take an interest in who’s collecting what.


Sometimes it hurts to be right.

Filed under: — adam @ 11:37 am

‘The Mozilla Team has quietly enabled a new feature in Firefox that parses ‘ping’ attributes to anchor tags in HTML. Now links can have a ‘ping’ attribute that contains a list of servers to notify when you click on a link. Although link tracking has been done using redirects and Javascript, this new “feature” allows notification of an unlimited and uncontrollable number of servers for every click, and it is not noticeable without examining the source code for a link before clicking it.’

‘I’m sure this may raise some eye-brows among privacy conscious folks, but please know that this change is being considered with the utmost regard for user privacy. The point of this feature is to enable link tracking mechanisms commonly employed on the web to get out of the critical path and thereby reduce the time required for users to see the page they clicked on. Many websites will employ redirects to have all link clicks on their site first go back to them so they can know what you are doing and then redirect your browser to the site you thought you were going to. The net result is that you end up waiting for the redirect to occur before your browser even begins to load the site that you want to go to. This can have a significant impact on page load performance.’

Oh, well, that makes it all okay then. It’s for the user experience.

Where does Darin’s next paycheck come from? Oh, right. It’s Google. But I’m sure they have only our best interests at heart.


This is very very bad for Google’s stock price

Filed under: — adam @ 7:46 pm

“Billy Hoffman, an engineer at Atlanta company SPI Dynamics unveiled a new, smarter web-crawling application that behaves like a person using a browser, rather than a computer program. “Basically this nullifies any traditional form of forensics,” says Hoffman. The program comes from different internet addresses, simulates different browsers and throttles itself to human-like speeds.”

Currently, it’s hard to tell the difference between a human click and a robot click, but it’s still possible to make a reasonable guess, and cheap as they are, getting banks of low-paid clickers in 3rd world countries is still comparatively pricey.

But the ability to run a crawler that’s indistinguishable from a human blows all of that out the airlock. And if it’s impossible to tell the difference between an automated click and a human, the AdWords value proposition goes away.,70016-0.html?tw=rss.index


Opera blogging policy

Filed under: — adam @ 8:34 pm

Opera has a public blogging policy. Google, which has fired at least one person for comments on his blog, doesn’t. Yet one more reason why I like Opera.

Interesting list of Google acquisitions

Filed under: — adam @ 12:38 pm

This is a list of companies that Google has bought. There are some on there that I hadn’t seen before.


Rumours of Google acquisition of Opera

Filed under: — adam @ 12:19 pm


Dear Google: Please stop buying good companies/developers and ruining them with your consumer unfriendly terms of service and loose privacy policies. Thanks a bunch. – Earth.

And I quote from Opera’s privacy policy (

No personal information is collected or shared, and providing ad profile information in the browser is strictly optional. The Opera user’s Web usage is not tracked.

There’s nothing like this in any Google policy, because this very idea is antithetical to Google’s philosophy, which wants to collect and know everything about you and use that to “improve the Google user experience”/stock price. This phrase in the Opera privacy policy is critical to what makes Opera any good at all. Let’s all gather round and keep an eye on that if this rumor turns out to be true.


Google really wants your logs

I wrote here about some of the privacy implications of Google’s data retention policies:

With the launch of Google Analytics, Google is now poised to collect that data not only from every Google visit, and every site that has Google ads on them, but also every site processed by Google for “analytical” purposes (although there’s probably a fair amount of overlap between the latter two).

Remember – Google does not consider your IP address to be personal information, and so it’s exempt from most of the normal restrictions on how they use the data they collect. The terms of service for Google Analytics suspiciously do not mention whether Google is allowed to utilize any of the data they collect on your behalf. One must conclude that they therefore assume that they are, and consequently that they do. It’s unclear, but it’s probably the case that Google could, according to the terms of these agreements, correlate search terms from your IP address with hits on other websites. I don’t see anything in there preventing them from doing so, because the two pieces of correlated data are obtained by different means.

Looky there, Google Web Accelerator is back

Filed under: — adam @ 12:59 pm

Google has apparently relaunched their controversial Web Accelerator.

I think I’ve already covered in detail all of the problems with this, and nothing seems to have changed except they’re just hoping people forgot about all of the reasons since last time, so just go read the previous articles if you missed them the first time around:

And especially this one:


Google Base launches

Filed under: — adam @ 12:46 pm

Clearly, Google would just as soon prefer not to have an internet at all.

Does BASE stand for “Big All-Seeing Eye”?


What’s wrong with the Google Print argument

Does this phrase sound familiar? “You may not send automated queries of any sort to Google’s system without express permission in advance from Google.” It’s from Google’s terms of service, and it’s just one of several aspects of that document that make this leave a bad taste in my mouth.

Larry Lessig makes the point that “Google wants to index content. Never in the history of copyright law would anyone have thought that you needed permission from a publisher to index a book’s content.” But that’s not what Google wants to do. Google wants to index content and put their own for-pay ads next to it. Larry says ” It is the greatest gift to knowledge since, well, Google.”

Don’t forget this for a second. Google is not a public service, Google is a business. Google isn’t doing this because it’s good for the world, Google is doing this because it represents a massive expansion in the number of pages they can serve ads next to. In order to do that, the index remains the property of Google, and no one else will be able to touch it except in ways that are sanctioned by Google. It’s not really about money, it’s about control. It’s against the terms of service to make copies of Google pages in order to build an index. Why should it be okay for them to make copies of other people’s pages in order to build their own? It’s not that they’re making money that bothers us, it’s the double standard. The same double standard that says that Disney can take characters and stories from the public domain, copyright them, and then lock them up and prevent other people from using them.

Oh, but you hate that, don’t you, Larry? (And I think a lot of us do.) How is what Google is doing any different? Google is just extending the lockdown one step further, into their own pockets. There’s no share alike clause in the Google terms of service, and that is what’s wrong with it. They want privileges under the law that they’re not willing to grant to others with respect to their own content.

The day Google steps forward and says “we’re building an index, and anyone can access it anonymously in any way they please”, then sure – I’m all with you.

(Found at


Google Base

Filed under: — adam @ 10:41 am

Google has a new product in the works – Google Base. It’s essentially a free-form database with flexible and user-defined schemas that lets you “publish” items. Where they’re published is not yet apparent, although they’re clearly targeted directly at various Google services in addition to whereever they “live”.

Google, obviously, is tired of crawling the web for all your shit, and wants you to just give it to them directly in a way they can easily index.


Please stop telling people to “google it” in public forum posts

Filed under: — adam @ 12:09 pm

I’ve noticed that with increasing frequency, I’ll search for something on one of the search engines and be directed to some forum post where the answer is “google for the answer”.

How do you think I found you in the first place?!?!

If it’s possible that your page will turn up in the search results for the thing you’re discussing, please include the answer (or at least, a specific URL where the answer can be found).


Google and MSN search results differ on Google/Microsoft lawsuit results

Filed under: — adam @ 8:07 pm

A researcher found that a search for “Dr. Lee court documents Google Microsoft” (no quotes), in reference to the lawsuit between MS and Google over the hiring of a key employee, yielded vastly different results from MSN and Google. As happens, the results have been a bit skewed by the existence of this observation, but my results seem to roughly correspond to those reported.

This is an interesting contrast to the usual “we refuse to comment during an ongoing investigation”. I wonder if this is indirectly caused by indexing of internal company pages that link to one viewpoint or another.

Incidentally, I find it not suprising in the least that the search results aren’t impartial.


Why I oppose DRM

As some of you know, on September 11, 2001, I lived one block north of Battery Park, at 21 West Street. (Ironic popup tag provided courtesy of Google Maps.) When I was forced to leave for thirteen days while the smoke cleared, I had little time to grab anything. I left without my computers, without my original installation discs, and without all of my Product ID stickers. I found myself suddenly without the mechanism to reinstall a number of legally purchased programs that I needed to use for work, and taking a lot of time that could have been better spent wallowing in my own PTSD calling around to various companies to get them to unlock things for me.

There were stories of rescue workers hampered by license management, and that’s when I knew.

The world is dangerous, and sometimes emergencies happen. While people can say “hey, maybe we should make an exception here, because there are extenuating circumstances”, computers just don’t care about that. We are backing ourselves into a restricted corner, and a dangerous one, where computers call the shots, even in the midst of crisis, even in the midst of rational exceptions. Granted, every case is not this extreme. Hopefully, the future will be without another like it in my immediate vicinity. But the trend to pre-emptively lock down everything by default scares me.

As we evolve towards tighter and tighter controls without any possibility for exception, what happens when those granting agencies stop granting? What happens when companies that issue DRM go bankrupt? What happens if they’re unreachable? What happens if they simply decide to stop supporting their framework?

As my high school calculus teacher used to say – “it’s always easier to ask forgiveness than to ask permission”. Security is many tradeoffs, and if you restrict legitimate uses in the name of preventing illegitimate ones, you’ve cut off part of the point of having security in the first place. If you restrict legitimate uses without even preventing the illegitimate ones, you’re wasting your customers’ time, and you’re part of the problem.

See more of my rants on DRM and security.

Blog-a-thon tag:


Google maps hack to display Iraq casualties

Via hackaday:

[Update: another map (not a Google map), this one with casualties plotted over time by country and location in Iraq:]


“Beware the Google Threat”

Filed under: — adam @ 1:22 pm

Wired picks up on the fact that Google is growing fast, aggregating a LOT of data on its users (some of it incredibly personal) and not telling anyone what they’re doing with it behind the veil of vague privacy policies.

The article is a decent summary, but it also misses the point that a good deal (if not all) of the Google information is subject to revelation simply if “We conclude that we are required by law or have a good faith belief that access, preservation or disclosure of such information is reasonably necessary to protect the rights, property or safety of Google, its users or the public.

That is NOT the same as a subpoena. I talked about this previously in a discussion about why Google’s needs come before your own.,1284,67982-2,00.html


Google patent filing reveals interesting information about ranking

Filed under: — adam @ 11:29 am

They apparently look at how far in the future your domain expiration is, how your links manifest over time, and how the focus of your site is directed over time, among other things.

It does look to me like, using their definitions, “blog” is largely indistinguishable from “spam”.

Here’s the actual patent application:

(Note: Google’s name doesn’t appear to be on this patent.)


Google will eat itself

Filed under: — adam @ 11:58 am


“We generate money by serving Google text advertisments on our website With this money we automatically buy Google shares via our Swiss e-banking account. We buy Google via their own advertisment! Google eats itself – but in the end we will own it!”


Pagerank… It’s made out of… PEOPLE

Filed under: — adam @ 7:37 pm

Search Bistro claims that Google pays a small army of people to vet search results. So much for the pigeons.,-Prelude.html


Prediction: GTax

In a conversation this weekend, on a whim, I made the prediction that within 3 years, Google will offer electronic tax filing.


Encryption is not a crime

I’m not sure how I feel about this.

A Minnesota court has ruled that the presence of encryption software is valid evidence for determining criminal intent. On the one hand, it seems like a severe misunderstanding of how the modern world actually works, given that encryption is absolutely essential for many things we take for granted.

I guess I can see that if there’s other evidence, this might be used as evidence that you have something to hide, but I worry for the situation where there isn’t any other evidence of a crime, and the fact that there’s something to hide becomes the key determining factor.

Everyone has something to hide. It may be private, it may be secret (not the same thing), it may be evidence of a crime, or it may be evidence of something that someone else thinks is a crime but you don’t. For the latter two, that is, of course, why we have a legal system in the first place. For the former two, there are plenty of legal reasons to want to keep those things private or secret.


Google is destroying the private

Filed under: — adam @ 12:13 pm

A year and a half ago, I read a great essay by Danny O’Brien (who now works at the EFF) illustrating the difference between public, private, and secret:

Google has a history of disregarding the private-but-not-secret. The Google Toolbar causes pages that aren’t linked from anywhere to end up in the index anyway when they’re visited. Now, they’re dismantling this distinction even further.

Some things aren’t linked, or they’re protected with plaintext passwords. THIS DOESN’T MEAN THEY ARE PUBLIC. By putting up a password but not encrypting, or not linking to pages, you’re saying “I know this isn’t really secret, but go away anyway. There’s nothing valuable to you here, and don’t make me work too hard to keep you out.” This is roughly equivalent to putting up a “no-trespassing” sign.

The Web Accelerator ignores private-but-not-secret login functionality by returning pages generated with the cookies (i.e.: logins) of other Web Accelerator users.

This is Google coming by and taking down all of the no-trespassing signs on the web, and forcing everybody to put up fences to keep the poachers out. I can’t even begin to see how this is okay.

Would Google be equally fine with the situation if some other company (Yahoo or Microsoft come to mind as the obvious candidates) were to release their own Web Accelerator that proxied Google pages and mangled all of the relationships between cookies and users?

Just because this stuff isn’t secret doesn’t mean it’s public either. There’s a distinction here that should be maintained, and isn’t. Google, not using https for all of its own pages, should realize and recognize this.


Google Web Accelerator breaks web apps

Filed under: — adam @ 11:03 am

By prefetching every link on a page, the Google Accelerator apparently clicks all of the “delete this” and “cancel” links too, and ignores javascript confirmations.

Way to go.

They don’t necessarily know who you are

In the last post, I wrote a lot about what’s wrong with Google’s new services and terms of service. I think one thing bears important repeating.

MANY of your important interactions with Google are unencrypted. As such, it is even more trivially easy to steal the value of someone’s Google cookie, and possibly pose as that person to Google. It’s possible that Google has taken precautions against this, but the risk is currently unknown. If this is possible, I think that throws a huge wrench into the use of this information by law enforcement.

I remember early discussions when it was first revealed that Google was storing a persistent lifetime cookie. It was generally perceived to be “okay” only because the value was not to be tied to search history in any way. We predicted that someday it would be.

Sometimes the slippery slope is actually slippery.


Google wants your logs

I’ve been kicking this around for a while, given the release of Google’s ability to save searches.

Google just announced the Google Web Accelerator, and this has the same kinds of privacy issues surrounding it, so I’ll discuss them both here. For those not in the know, Google Search History is the feature that lets you access your past searches if you’re logged into Google. The Web Accelerator is a proxy that pushes all of your browsing through Google’s servers. Ostensibly, this is to make your browsing faster, but it also has the side effect that Google can (and presumably will) monitor both the URLs and contents of every web page you’re looking at. You make a request for a web page, and Google fetches it for you. I’d expect that they’re also doing various tricks with preloading and caching.

Google is poised to collect a lot of data on browsing habits, and every indication is that they plan to keep it around.

As a brief aside, while I don’t personally know anyone who works for Google, I do have some friends who do. Every one of them has, in the past, asserted during conversations about Google’s privacy concerns, that Google both has (or had) no intentions of keeping permanent searching / browsing logs, and has (or had) actually built up complicated encryption / hashing mechanisms to allow aggregate data to be kept without individual search histories. That may have been true at one time, although I personally found it doubtful, given that if it were true, Google could only benefit by stating it publicly. They have never done so, and recent events have shown that assertion to be presently categorically false. Google does want to keep your individual search history. I think that’s a relevant point to the privacy debate.

In reference to search history, I wrote but never published, the following: “Search history is a sensitive area. Saving and aggregating search history is of dubious value to the end user – it’s maybe a minor convenience at best. If you care about that sort of thing, you’ll want to capture for yourself far more information than just search history, and do it locally across the board. There are several plugins for Firefox that will do exactly that for you, and not only watch your tracks, but save complete copies of everything you’re browsing.” In reference to the web accelerator, it’s evident that Google is heading towards collecting that information for themselves.

Set aside the fact that Google has now become an extremely juicy target for a one-stop shop for identity thieves. I’m sure they’ve got great security. But do you? Google’s lifetime cookie is, as always, a serious point of possible failure. One good cross-site scripting attack or IE exploit, or even a malicious extension, and the Google cookie can be easily exposed. What’s your liability for being associated with a search history, or now a browsing history, tied to a stolen Google cookie?

But here’s the real doozie.

The Google Privacy Policy states that Google may disclose personally identifiable information in the event that:

“We conclude that we are required by law or have a good faith belief that access, preservation or disclosure of such information is reasonably necessary to protect the rights, property or safety of Google, its users or the public.”

Welcome to Google, where the Third Law comes first.

This has serious implications. For logged-in users using all of Google’s services, this now includes the contents of your emails, your complete search AND browsing history, any geographical locations you’re interested in, what you’re shopping for, and probably plenty of things I haven’t thought of yet.

I posit that it would not significantly damage Google in any way for them to actually make use of this information, and that Google could withstand any public backlash resulting from it.

I think we’ve long passed the point at which we say “this is bad”.

This is bad.

In case you haven’t been paying attention, there’s a word for this.

It’s called “surveillance”.

I believe that Google should revise their privacy policy to reflect the actual intended usage of this information, and they should clarify under what circumstances this information will be released, and to whom. Will this information be used to catch terrorists? Errant cheating spouses? Tax evaders? Jaywalkers? Anarchists? Litterbugs? As a user, you have a right to demand to know. Of course, don’t expect Google to tell you, since they don’t actually get any of their money from you.



Google adds “Prove Adam Right” button

Filed under: — adam @ 8:27 am

“Google Inc. is experimenting with a new feature that enables the users of its online search engine to see all of their past search requests and results”.

There is so much wrong with this that I don’t even know where to begin. I’m writing something up for a followup post.


Note to users of Earth

Filed under: — adam @ 2:37 pm

“Google’s success doesn’t automatically mean that every wacky idea is worthwhile.”


Butler turns the tables on Google Autolink

Filed under: — adam @ 3:16 pm

Butler is a user script for Greasemonkey, that autolinks Google results to competing services. I’m curious to see how they like it, given that they think it’s okay to do to others.


Google nailed for using sleazy SEO tactics

Filed under: — adam @ 9:42 pm

Does Google really need to stuff keywords to increase their own rankings?


Why I don’t like Google Autolink

Filed under: — adam @ 11:33 am

Kottke thinks that the Google toolbar is a good idea. Here’s why I disagree.

I have a strong visceral reaction to this because it disturbs the decentralized nature of the web. It’s the same reason people got upset about DoubleClick tracking visits from one site to another through a shared cookie. It’s because a lot of what makes the web the web is that there are disparate competing resources from LOTS of different sources, and Autolink gives that the finger.

For me, the issue isn’t about modifying layout or even content, it’s about Google standing between the user and every other site and saying “you go here now”. Some things are bad just by being ubiquitous. In a sea of Amazon and Google Maps links, everything else will start to look out of place.

As for reasonable intelligent adults being able to make their own decisions, as I’ve said before, I think that technology has gotten to be too pervasive for the non-technical to have enough information and perspective to make these decisions, and the informed experts need to take a stand against what we perceive to be detrimental trends being enforced without full and knowledgeable consent.

Jason thinks this isn’t like DRM, but it is – it’s about centralized control. Don’t think for a second that this “puts the power in the hands of the user”.


Yahoo and Google take different evil baby steps

Filed under: — adam @ 1:16 pm

So I noticed that sometime in the past few days, Yahoo has started tracking outbound links. When you do a search on Yahoo, say… for “stuff”, you get this page, on which all of the links for results are filtered through one of yahoo’s servers, so they can see what you actually clicked on. I’m pretty sure this wasn’t always the case, but I could have missed it.

The links look like this:*-http%3A//

Meanwhile, Google has resurrected Microsoft’s much-reviled Smart Tags, in the form of Autolink on the Google Toolbar. Google has decided in their infinite wisdom that, for example, anything that looks like an address should link to Google Maps. I fail to see how this was monopolistic behavior on the part of Microsoft but it’s totally okay for Google. There’s actually a word for this kind of unsupervised, non-user-controlled substitution… it’s “hijacking”.

Evil? Maybe. Potential for evil? Certainly. I’d call them “evil baby steps”. But both of these things certainly bear some discussion.


Canon 20Da (astro)

Filed under: — adam @ 2:50 pm

Canon has apparently modded the 20D to optimize it for long night shots with low noise:


Paper Katamari Damacy Prince

Filed under: — adam @ 12:38 pm

A paper cutout of the Prince (Google apparently thinks the Japanese word for Katamari Damacy means “Lump Soul”).

Also, some great screenshots of KD2:


Google adds “nofollow” attribute for links

Filed under: — adam @ 11:27 pm

Google is now honoring the rel=”nofollow” attribute in link tags. Basically what this means is that links in comments, and links to your competitors, or links to things you hate can be eliminated from consideration in computing the page rank of the destination page.

MSN search and Yahoo are also adding this:

On the one hand, I think this is a great idea and a long time coming – I’ve often complained that links aren’t all alike and should be treated differently.

But, on the other… I run a small blog, and I get a lot of my page karma from comments I put in other blogs. I don’t see that as necessarily wrong – my comments are always on topic, and if the owner of the blog doesn’t agree, they can always delete the comment or the link. Google isn’t tracking clickthroughs (yet), so they have no way to know if a given link in this context is actually popular or not. Automatically including this tag in the comments section may decrease the level of comment spam, but it’s also going to hurt a lot of small bloggers as well, I think. And if you’re reading the links individually to make the distinction, well… why not just delete the spam ones? This is obviously meant to be an automated measure, and it’s going to catch a lot of legit links too.

It’s just pushing the unknown down one layer, and substituting one set of unknowns (owner links vs. comment links) for another (legit comment links vs. spam links).


Okay, “mini” is now over.

Filed under: — adam @ 1:19 pm


The dangers of context-sensitive ads

Filed under: — adam @ 11:54 am

The feedburner rss feed Amazon ad insertion service is serving inappropriate ads.

I believe the Google context-sensitive ad service takes measures to prevent this sort of thing. They all should.

This is a screen capture from my bloglines page (not my blog):

Tsunami ad screenshot

I’ve notified feedburner of this, but I think it’s important to think about the implications in general. If we’re going to start inserting context-sensitive ads all over the place, there ought to be a little sensitivity to, say, the death of more than 75,000 people.


What the bagel man saw (and started)

Filed under: — adam @ 11:50 am

This is a great story about a guy who retired to sell bagels on the honor system. It’s also an interesting glimpse into the sociology of white collar crime.

Update: My friend Mark points out:

“As noted at the very very end of your link, this is actually from the NYTimes Magazine 6/6/2004 (where I first read it). Of course, you can no longer read it at for free (too long ago), and it’s your choice to circumvent (c) by pointing to the blog entry. But you should at least give the proper credit in your lead-in, methinks.”

While credit is certainly due, I’m not sure I agree that it’s a copyright violation. It certainly is a sticky situation.

I think this may qualify as “copying for personal use” even though it happens to be accessible to the public. It looks like this may have come from the email list. The original ad is included, it links back to the source, and there’s no financial gain involved in distributing it. There’s ostensibly financial loss on the part of the NYT, because people who might have paid to view the full article now don’t have to. I will note that the NYT general terms of service includes no mention of articles sent via email. I will also note that this article does NOT appear in the Google cache, or even appear to be indexed at all by Google (although that may just have fallen into the hole where Google was unable to index new pages because they’d hit the 32-bit limit on their indexes, and not be directly copyright related).

But, let’s try to be fair about this. The NYT is charging $2.95 for back articles. That seems like a lot, although you get a bulk discount, which has never made sense to me for electronic content. Still, let’s call it a market rate of $3. Since they have a monopoly, they can set the price. So, I propose this. If you read this article and liked it, let’s be fair to the NYT, and try to convince them that they’ll do better by asking for money than by demanding it. Clearly, they can’t stop this article from being copied. I’m mostly of the opinion that they shouldn’t try.

So I’ve set up a dropcash donation page to make voluntary donations to the NYTimes for enjoying this fine piece of writing (and others).

“The New York Times charges for back articles. We think this is unfair and expensive. This is a voluntary fund to donate money to the New York Times as compensatory payment for viewing articles that have been copied. This is an exercise to convince the New York Times (and the content generation industry in general) that they can get more money by asking for it than by demanding it, and that we acknowledge that copying can’t be controlled.

For more background on how this came about, please see:

This is a voluntary donation, I’m guessing it’s not tax deductible, and I hope it’s not an admission of guilt.

The New York Times, as far as I can tell, has no official channel for receiving paypal donations. I figure that Daniel Okrent, “the readers’ representative” is the right person to deal with this sort of thing, so I have used his address as the paypal recipient.

I have chosen the current market cap of the NYT corporation (5.9B) as the goal for this campaign.”

Another update: It appears that the dropcash page doesn’t update the donated totals until the money has been approved by the recipient. Since the NY Times doesn’t officially accept paypal, they may never approve them, and so the donation page may never rise above zero. Please donate anyway.

(Also, don’t let this stop you from donating money to the Tsunami relief fund. Do both.)


New server worm targets php and uses Google to spread

Filed under: — adam @ 11:17 pm


“The Santy worm uses a flaw in the widely used community forum software known as the PHP Bulletin Board (phpBB) to spread, according to updated analyses. The worm searches Google for sites using a vulnerable version of the software”


Google launches Scholar search

Filed under: — adam @ 10:32 am

"Google Scholar enables you to search specifically for scholarly literature, including peer-reviewed papers, theses, books, preprints, abstracts and technical reports from all broad areas of research. Use Google Scholar to find articles from a wide variety of academic publishers, professional societies, preprint repositories and universities, as well as scholarly articles available across the web."

Interesting. There are other journal search engines out there, but I think this is the first time they’ve been incorporated by a general search engine, for free. I hope that the others follow this lead – I’d like to see even more of this kind of information open to the public.


Gmail security breach

Filed under: — adam @ 7:43 pm

There’s a Gmail exploit that allows an attacker to steal your Gmail cookie, which thereafter identifies them as you to the system, even if you change your password.

This seems like a huge problem for Google, above and beyond the actual security breach. Remember that Gmail uses the same unlimited lifetime Google cookie. The data in that cookie is, presumably, extremely valuable for their tracking efforts, and I’d guess that this will be difficult for them to fix in a way that maintains that.


Frightening consolidation of data

Filed under: — adam @ 1:52 pm

Google acquires Keyhole, a company specializing in high-resolution satellite imagery.

Add to the list of things Google knows about you: Where you live.


Kaboom! Google Desktop Proxy

Filed under: — adam @ 6:40 pm

Google Desktop limits your searches to just your local loopback
interface to prevent people on other machines from querying your
index. Hah! Along comes the Google Desktop Proxy, which allows open
searches from other machines. While this is theoretically meant to be
used for your benefit, I will NOT be surprised if this or something
like it shows up in an email worm somewhere along the line.


Google Desktop Sharing could be really really bad

Filed under: — adam @ 1:45 pm

It’s possible that Google might want to network the various installations of Google Desktop into a P2P network. Google already has all of the pieces to make this work – the Hello software that’s part of Picasa already does it. It does raise the question of this – if they use Orkut to enable file sharing with Orkut friends, does the content you share then fall under the Orkut policies? Probably, which means that then, if this happens, and you use it, Google has silently acquired a “worldwide, non-exclusive, sublicenseable, transferable, royalty-free, perpetual, irrevocable right to copy, distribute, create derivative works of, publicly perform and display” any of your files. “Don’t be evil, my ass”.

The orkut terms of service is extremely one-sided (much more so than any of the other Google services), and any attempts by Google to incorporate orkut into any of its other services should be watched carefully. Or, even better, it should probably be stricken from this world.

Paolo Massa Blog: Enormous P2P Network by Google

The orkut Terms of Service

My earlier analysis on this.


Please don’t install Google Desktop

Filed under: — adam @ 11:54 pm

Google has repeatedly, across all of their recent product launches, failed to properly address privacy and security concerns.

“Don’t be evil” doesn’t cut it.

Google Desktop privacy branded ‘unacceptable’ | The Register

On top of that, there’s a larger issue at hand which has not been properly addressed. There is a difference between network content and desktop content. There is a security/privacy difference (who do I want to see this?), a latency difference (how long does it take to see this?), and a control difference (will I always be able to see this, in this form?), at the very least. Treating network content and desktop content as the same thing is a leaky abstraction at best. Blurring the line without properly addressing these differences is an invitation to disaster. But sure, it’s easier than teaching people what the differences are.

Powered by WordPress