Is Google’s AdWords Guilty of Racial Profiling? No, not really…

Posted on February 5, 2013 by Paul Ratcliffe

A recent paper by Dr Latanya Sweeney of Harvard seems to suggest that Google’s AdSense ad delivery network is guilty of racial discrimination when displaying ads (although it’s actually talking about AdWords). This has led to a whole bunch of articles appearing over the web today covering the original paper with varying degrees of accuracy and accusatory headlines, e.g. Salon with “Online Advertising’s Racism Mess“, Geekosystem with the less inflammatory “Harvard Professor Says Google Results Reflect Racism“, Gizmodo with the off target “Are Google Searches Racist” (the paper doesn’t talk about Google search results, only adverts), MIT Technology Review with the relatively sober “Racism is Poisoning Online Ad Delivery, Says Harvard Professor” and the straight to the point “Is Google racist?” from the UK’s Daily ~~Fail~~ Mail.

Dr Sweeney’s paper makes for an interesting read. Here’s the gist:

Over in the US, a company called Instant Checkmate provides a service whereby you can use their website to search through the public arrest records of much of the United States. They decided to run a Google AdWords campaign to advertise their service through the “Search Network” which places ads on the Google search results page and on the search results pages of third party companies that use Google search. Note: the original paper conflates AdSense with AdWords which doesn’t help when it comes to drawing conclusions.

Instant Checkmate setup their ads in such a way so that when someone searched for something that AdWords decided was a real name (consisting of first name and last name) that name would be inserted into the advert. They also created a few different versions of their advert using different wording. The upshot of this set up is that when typing in a real name into google.com an advert could be triggered that was of the form:

Bob Madeupname Arrested? or
Located: Bob Madeupname.

Dr Sweeney ran a number of searches on google.com using one set of names that (in the US) is more usually associated with black people and one set of names that (in the US) is more usually associated with white people. She found that the “usually black” names triggered the “Arrested” form of the advert 60% of the time whereas the “usually white” names only triggered the “Arrested” form of the advert 48% of the time, which (if all forms of bias in the experimental design have been accounted for) is a statistically significant result (i.e. highly unlikely to have happened through chance).

So far so good – it’s an interesting result and the method used in the paper seems at first glance to be reasonable (clearing of cache and cookies between each search, searching from different locations). Where is all goes wrong is in the analysis and conclusions section of the paper. After a couple of paragraphs speculating (in no great depth) on what might be the cause of this result, and saying (quite rightly) that more research is needed, Dr Sweeney then states outright “There is discrimination in delivery of these ads”. Errrmmm, that’s a bit of a strong conclusion to draw based on a single, fairly small study in a paper that makes no real attempt to explain any possible sources of bias in the data. Still, I suppose it makes for good headlines…

Here’s a few lines on some factors that are could be influencing the results (some covered in the paper, some not):

Different bidding in the ad auction on “usually black” names compared to “usually white” names. This seems like the obvious conclusion to draw, but a footnote in the paper states that Dr Sweeney spoke with Instant Checkmate and they said that they created a single set of ad templates and set up one set of keywords relating to a bunch of last names that would trigger the ads, so that would suggest we can discount this point.
The “Quality Score”. Google assigns a score to each keyword you bid on and each advert you create based on how well it perceives that the advert and keyword relate to the content of the website that you are advertising. If you have two ads with the same bid in the ad auction, the one with the highest quality score will win. So, it’s possible that the “Arrested” ads combined with the “usually black” names have a higher quality score than the “Arrested” ads with the “usually white” names. I don’t know how much of the record set Instant Checkmate uses is exposed to Google, but given the hugely skewed racial make up of the US prison population compared to the overall population this could happen, thus affecting the way the ads are delivered.
Human behaviour. This is touched on in the paper. Google will rotate the ad templates that you supply to AdWords and preferentially show the ones that have historically got the most clicks. So, if the “Arrested” form of the advert gets more clicks it will be shown more times than the “Located:” form of the name. At first glance it seems like this is a likely cause of the differential. But, here’s an interesting thought: to the best of my knowledge this only happens at the ad level, not at the keyword level (feel free to put me right – comments below). So, if there are only one set of ad templates and one set of keyword last names, all bundled up together (see point 1) then surely this wouldn’t apply as any changes in the click rate would affect both sets of names equally. Hmmm… As the paper says, more work needed here…
Is there any geographical targeting going on? Different parts of the US have greatly differing demographics regarding race and income. It’s possible that even if there is only one set of templates and one set of names there are different ad campaigns (perhaps with different budgets) set in different parts of the US. Each one of those would be separately influenced by the human behaviour aspect touched on above. When viewed as a whole you might then get the picture seen in the paper, but it might not exist on a single campaign level (the experiment isn’t detailed enough as regards locations of searching to discern this). Without access to this information it’s extremely hard to interpret the results.
What is the income of Instant Checkmate’s average customer for each of the services that they offer? Bear with me here, but this really does make a difference. If the “Arrested” advert is advertising a service that is more often taken up by people living closer to the poverty line than the “Located” service, then this could hugely skew the results. The poverty rate among black people in the US is far higher than the national average so there is a distinct possibility that the results seen in the paper are at least partly down to relative levels of poverty associated with those groups of names rather than race per se.

So, in conclusion, it’s an interesting paper but it like all too much scientific literature it fails to think hard enough about its experimental design (see Ben Goldacre’s Bad Science site for lots more examples). As a result its headline grabbing conclusion really can’t be supported. Is Google’s AdSense guilty of racial profiling? No, not really. At worst it could be reflecting a number of problems in the US that are related to racism and/or poverty, but without considerably more background information on exactly how the Instant Checkmate campaign was set up there isn’t an easy conclusion to draw here.

Update 7 Feb 2013

I’ve been thinking about this a little more and realised that one of the things that annoyed me most about this paper is that it’s treating Google’s services as if they are natural phenomena that can only be observed, not influenced in any way. They aren’t, they are a set of known algorithms. Dr Sweeney’s paper could have been really interesting, but falling foul of this mindset (which is all too common in the internet and SEO industry) makes for a missed chance.

If she had wanted to look at if and how human behaviour skews the delivery of Google adverts in a discriminatory fashion it would have been very easy to do. Google AdWords is open to everyone, so why not set up a properly controlled experiment where everything about the design of the adverts is known and then observe the results rather than look at how somebody else’s ads are delivered and try and second guess how they were set up? As the paper was apparently part funded by Google this could probably have been set up at little to no cost, and possibly with access to extra data on the ad delivery direct from the horse’s mouth.