[Previous entry: "Sadr"] [Main Index] [Next entry: "George Bush's Vietnam?"]

04/06/2004 Archived Entry: "Google, data mining, and cookies"

I see that Wendy has already written about Google's awful new web-mail service. Here's another article about Google's attack on privacy. Google intends to retain all of your "Gmail," even after you terminate service; snoop through it, and match it up to any other data thay have about you. I'm sure John Ashcroft is licking his lips in anticipatory delight.

It's bad enough when government does this; it's no better when done by the private sector, and Google won't be hobbled by incompetence. Do not sign up for Gmail. It might be safe for a "throwaway" email account, but I think a full boycott is in order. I'm sufficiently angry that I'm even evaluating alternative search engines.

This returns me to the subject of "cookies" and their use in data mining.

When you visit a web page, that web server has the ability to store a "cookie" on your computer. This is just a bit of data that is "persistent" -- it stays there after you leave that page and go visit others. It can even stay when you shut your browser down and restart it later. If you revisit that web page (or another page on the same site), that web site can read the cookie it previously stored. (Or it can discover that it hasn't previously stored a cookie, and then choose to store one.)

This is reasonably innocuous, and has many valid uses. For example, when you log into the ifeminists.net forum, the forum software stores a cookie so that it knows who you are. As you visit all the various forum pages, the software can "remember" you so you don't need to log in again.

Mozilla stores cookies in a text file (cookies.txt), so they're easy to see. Here's a cookie I picked up while visiting The Register:

www.theregister.co.uk FALSE / FALSE 1082723858 Dwww.theregister.co.ukWIDYMD #20737:DDF#18919:DDE#

Beyond the first part (the domain name) I have no idea what this means, but it means something to the software at The Register. Caution: cookies often contain encrypted (or unencrypted!) user names and passwords, so keep them private. I don't have an account at The Register, so I know this particular cookie doesn't contain any sensitive information.

Ordinarily, only the web site that stored a cookie can retrieve it. (In this case, www.theregister.co.uk.) But this didn't stop cookies from being used in one of the great data-mining scandals of the Internet. Here's how programmers added 2+2+2+2 to get 8:

1. When you load an image as part of a web page, your computer treats it the same as loading a web page. (It uses the same HyperText Transfer Protocol, the GET command.)

2. Images don't need to be stored on the same server as the web page. An image on a whoozis.com web page can be fetched from, say, doubleclick.com. This is common for web page advertising. (On Mozilla you can observe these by clicking View -> Page Info -> Media.)

3. Because loading an image is like loading a web page, the server that provides the image is told the "referring page," which in this case is the web page that contains the image (the web page you're viewing).

4. Also, because it's like loading a web page, the web server that provides the image can store and read its own cookies on your machine. This cookie can contain a identifying code which is unique to your machine.

Add it up, and it means that the advertising company can keep a record of every web page you visit that carries one of their ads. They can build a "profile" of your web surfing. And if they can match this data up to you personally -- say, through your IP address or an innocent-looking "free" subscription-based service -- then your privacy is gone.

Talk about unexpected consequences.

In the uproar over this, most web browsers began adding cookie-control features. Mozilla and Opera let you block "third-party" cookies (not coming from the site you're visiting). You can even block cookies from specific sites, or all cookies, or require manual approval of each cookie. Both also provide "cookie managers" that let you inspect and remove individual cookies.

Many web sites depend on cookies to function. So I leave my browsers set to reject third-party cookies, and allow cookies from the "originating web site" (Mozilla's term). This gives pretty good protection. For maximum privacy, I'll switch one browser (usually Opera) to reject all cookies.

brad

More links about cookies

Powered By Greymatter