Google’s FLoC Is a Terrible Idea

Source: https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea

by: Bennet Cyphers | Published: March 03, 2021

The third-party cookie is dying, and Google is trying to create its replacement.

No one should mourn the death of the cookie as we know it. For more than two decades, the third-party cookie has been the lynchpin in a shadowy, seedy, multi-billion dollar advertising-surveillance industry on the Web; phasing out tracking cookies and other persistent third-party identifiers is long overdue. However, as the foundations shift beneath the advertising industry, its biggest players are determined to land on their feet.

Google is leading the charge to replace third-party cookies with a new suite of technologies to target ads on the Web. And some of its proposals show that it hasn’t learned the right lessons from the ongoing backlash to the surveillance business model. This post will focus on one of those proposals, Federated Learning of Cohorts (FLoC), which is perhaps the most ambitious—and potentially the most harmful.

FLoC is meant to be a new way to make your browser do the profiling that third-party trackers used to do themselves: in this case, boiling down your recent browsing activity into a behavioral label, and then sharing it with websites and advertisers. The technology will avoid the privacy risks of third-party cookies, but it will create new ones in the process. It may also exacerbate many of the worst non-privacy problems with behavioral ads, including discrimination and predatory targeting.

Google’s pitch to privacy advocates is that a world with FLoC (and other elements of the “privacy sandbox”) will be better than the world we have today, where data brokers and ad-tech giants track and profile with impunity. But that framing is based on a false premise that we have to choose between “old tracking” and “new tracking.” It’s not either-or. Instead of re-inventing the tracking wheel, we should imagine a better world without the myriad problems of targeted ads.

We stand at a fork in the road. Behind us is the era of the third-party cookie, perhaps the Web’s biggest mistake. Ahead of us are two possible futures.

In one, users get to decide what information to share with each site they choose to interact with. No one needs to worry that their past browsing will be held against them—or leveraged to manipulate them—when they next open a tab.

In the other, each user’s behavior follows them from site to site as a label, inscrutable at a glance but rich with meaning to those in the know. Their recent history, distilled into a few bits, is “democratized” and shared with dozens of nameless actors that take part in the service of each web page. Users begin every interaction with a confession: here’s what I’ve been up to this week, please treat me accordingly.

Users and advocates must reject FLoC and other misguided attempts to reinvent behavioral targeting. We implore Google to abandon FLoC and redirect its effort towards building a truly user-friendly Web.

What is FLoC?

In 2019, Google presented the Privacy Sandbox, its vision for the future of privacy on the Web. At the center of the project is a suite of cookieless protocols designed to satisfy the myriad use cases that third-party cookies currently provide to advertisers. Google took its proposals to the W3C, the standards-making body for the Web, where they have primarily been discussed in the Web Advertising Business Group, a body made up primarily of ad-tech vendors. In the intervening months, Google and other advertisers have proposed dozens of bird-themed technical standards: PIGIN, TURTLEDOVE, SPARROW, SWAN, SPURFOWL, PELICAN, PARROT… the list goes on. Seriously. Each of the “bird” proposals is designed to perform one of the functions in the targeted advertising ecosystem that is currently done by cookies.

FLoC is designed to help advertisers perform behavioral targeting without third-party cookies. A browser with FLoC enabled would collect information about its user’s browsing habits, then use that information to assign its user to a “cohort” or group. Users with similar browsing habits—for some definition of “similar”—would be grouped into the same cohort. Each user’s browser will share a cohort ID, indicating which group they belong to, with websites and advertisers. According to the proposal, at least a few thousand users should belong to each cohort (though that’s not a guarantee).

If that sounds dense, think of it this way: your FLoC ID will be like a succinct summary of your recent activity on the Web.

Google’s proof of concept used the domains of the sites that each user visited as the basis for grouping people together. It then used an algorithm called SimHash to create the groups. SimHash can be computed locally on each user’s machine, so there’s no need for a central server to collect behavioral data. However, a central administrator could have a role in enforcing privacy guarantees. In order to prevent any cohort from being too small (i.e. too identifying), Google proposes that a central actor could count the number of users assigned each cohort. If any are too small, they can be combined with other, similar cohorts until enough users are represented in each one.

For FLoC to be useful to advertisers, a user’s cohort will necessarily reveal information about their behavior.

According to the proposal, most of the specifics are still up in the air. The draft specification states that a user’s cohort ID will be available via Javascript, but it’s unclear whether there will be any restrictions on who can access it, or whether the ID will be shared in any other ways. FLoC could perform clustering based on URLs or page content instead of domains; it could also use a federated learning-based system (as the name FLoC implies) to generate the groups instead of SimHash. It’s also unclear exactly how many possible cohorts there will be. Google’s experiment used 8-bit cohort identifiers, meaning that there were only 256 possible cohorts. In practice that number could be much higher; the documentation suggests a 16-bit cohort ID comprising 4 hexadecimal characters. The more cohorts there are, the more specific they will be; longer cohort IDs will mean that advertisers learn more about each user’s interests and have an easier time fingerprinting them.

One thing that is specified is duration. FLoC cohorts will be re-calculated on a weekly basis, each time using data from the previous week’s browsing. This makes FLoC cohorts less useful as long-term identifiers, but it also makes them more potent measures of how users behave over time.

New privacy problems

FLoC is part of a suite intended to bring targeted ads into a privacy-preserving future. But the core design involves sharing new information with advertisers. Unsurprisingly, this also creates new privacy risks.

Fingerprinting

The first issue is fingerprinting. Browser fingerprinting is the practice of gathering many discrete pieces of information from a user’s browser to create a unique, stable identifier for that browser. EFF’s Cover Your Tracks project demonstrates how the process works: in a nutshell, the more ways your browser looks or acts different from others’, the easier it is to fingerprint.

Google has promised that the vast majority of FLoC cohorts will comprise thousands of users each, so a cohort ID alone shouldn’t distinguish you from a few thousand other people like you. However, that still gives fingerprinters a massive head start. If a tracker starts with your FLoC cohort, it only has to distinguish your browser from a few thousand others (rather than a few hundred million). In information theoretic terms, FLoC cohorts will contain several bits of [[https://en.wikipedia.org/wiki/Entropy_(information_theory) | entropy — up to 8 bits, in Google’s proof of concept trial. This information is even more potent given that it is unlikely to be correlated with other information that the browser exposes. This will make it much easier for trackers to put together a unique fingerprint for FLoC users.

Google has acknowledged this as a challenge, but has pledged to solve it as part of the broader “Privacy Budget” plan it has to deal with fingerprinting long-term. Solving fingerprinting is an admirable goal, and its proposal is a promising avenue to pursue. But according to the FAQ, that plan is “an early stage proposal and does not yet have a browser implementation.” Meanwhile, Google is set to begin testing FLoC as early as this month.

Fingerprinting is notoriously difficult to stop. Browsers like Safari and Tor have engaged in years-long wars of attrition against trackers, sacrificing large swaths of their own feature sets in order to reduce fingerprinting attack surfaces. Fingerprinting mitigation generally involves trimming away or restricting unnecessary sources of entropy—which is what FLoC is. Google should not create new fingerprinting risks until it’s figured out how to deal with existing ones.

Cross-context exposure

The second problem is less easily explained away: the technology will share new personal data with trackers who can already identify users. For FLoC to be useful to advertisers, a user’s cohort will necessarily reveal information about their behavior.

The project’s Github page addresses this up front:

This API democratizes access to some information about an individual’s general browsing history (and thus, general interests) to any site that opts into it. … Sites that know a person’s PII (e.g., when people sign in using their email address) could record and reveal their cohort. This means that information about an individual's interests may eventually become public.

As described above, FLoC cohorts shouldn’t work as identifiers by themselves. However, any company able to identify a user in other ways—say, by offering “log in with Google” services to sites around the Internet—will be able to tie the information it learns from FLoC to the user’s profile.

Two categories of information may be exposed in this way:

  1. Specific information about browsing history. Trackers may be able to reverse-engineer the cohort-assignment algorithm to determine that any user who belongs to a specific cohort probably or definitely visited specific sites.
  2. General information about demographics or interests. Observers may learn that in general, members of a specific cohort are substantially likely to be a specific type of person. For example, a particular cohort may over-represent users who are young, female, and Black; another cohort, middle-aged Republican voters; a third, LGBTQ+ youth.

This means every site you visit will have a good idea about what kind of person you are on first contact, without having to do the work of tracking you across the web. Moreover, as your FLoC cohort will update over time, sites that can identify you in other ways will also be able to track how your browsing changes. Remember, a FLoC cohort is nothing more, and nothing less, than a summary of your recent browsing activity.

You should have a right to present different aspects of your identity in different contexts. If you visit a site for medical information, you might trust it with information about your health, but there’s no reason it needs to know what your politics are. Likewise, if you visit a retail website, it shouldn’t need to know whether you’ve recently read up on treatment for depression. FLoC erodes this separation of contexts, and instead presents the same behavioral summary to everyone you interact with.

Beyond privacy

FLoC is designed to prevent a very specific threat: the kind of individualized profiling that is enabled by cross-context identifiers today. The goal of FLoC and other proposals is to avoid letting trackers access specific pieces of information that they can tie to specific people. As we’ve shown, FLoC may actually help trackers in many contexts. But even if Google is able to iterate on its design and prevent these risks, the harms of targeted advertising are not limited to violations of privacy. FLoC’s core objective is at odds with other civil liberties.

The power to target is the power to discriminate. By definition, targeted ads allow advertisers to reach some kinds of people while excluding others. A targeting system may be used to decide who gets to see job postings or loan offers just as easily as it is to advertise shoes.

Over the years, the machinery of targeted advertising has frequently been used for exploitation, discrimination, and harm. The ability to target people based on ethnicity, religion, gender, age, or ability allows discriminatory ads for jobs, housing, and credit. Targeting based on credit history—or characteristics systematically associated with it— enables predatory ads for high-interest loans. Targeting based on demographics, location, and political affiliation helps purveyors of politically motivated disinformation and voter suppression. All kinds of behavioral targeting increase the risk of convincing scams.

Instead of re-inventing the tracking wheel, we should imagine a better world without the myriad problems of targeted ads.

Google, Facebook, and many other ad platforms already try to rein in certain uses of their targeting platforms. Google, for example, limits advertisers’ ability to target people in “sensitive interest categories.” However, these efforts frequently fall short; determined actors can usually find workarounds to platform-wide restrictions on certain kinds of targeting or certain kinds of ads.

Even with absolute power over what information can be used to target whom, platforms are too often unable to prevent abuse of their technology. But FLoC will use an unsupervised algorithm to create its clusters. That means that nobody will have direct control over how people are grouped together. Ideally (for advertisers), FLoC will create groups that have meaningful behaviors and interests in common. But online behavior is linked to all kinds of sensitive characteristics — demographics like gender, ethnicity, age, and income; “big 5” personality traits; even mental health. It is highly likely that FLoC will group users along some of these axes as well. FLoC groupings may also directly reflect visits to websites related to substance abuse, financial hardship, or support for survivors of trauma.

Google has proposed that it can monitor the outputs of the system to check for any correlations with its sensitive categories. If it finds that a particular cohort is too closely related to a particular protected group, the administrative server can choose new parameters for the algorithm and tell users’ browsers to group themselves again.

This solution sounds both orwellian and sisyphean. In order to monitor how FLoC groups correlate with sensitive categories, Google will need to run massive audits using data about users’ race, gender, religion, age, health, and financial status. Whenever it finds a cohort that correlates too strongly along any of those axes, it will have to reconfigure the whole algorithm and try again, hoping that no other “sensitive categories” are implicated in the new version. This is a much more difficult version of the problem it is already trying, and frequently failing, to solve.

In a world with FLoC, it may be more difficult to target users directly based on age, gender, or income. But it won’t be impossible. Trackers with access to auxiliary information about users will be able to learn what FLoC groupings “mean”—what kinds of people they contain—through observation and experiment. Those who are determined to do so will still be able to discriminate. Moreover, this kind of behavior will be harder for platforms to police than it already is. Advertisers with bad intentions will have plausible deniability—after all, they aren’t directly targeting protected categories, they’re just reaching people based on behavior. And the whole system will be more opaque to users and regulators.

Google, please don’t do this

We wrote about FLoC and the other initial batch of proposals when they were first introduced, calling FLoC “the opposite of privacy-preserving technology.” We hoped that the standards process would shed light on FLoC’s fundamental flaws, causing Google to reconsider pushing it forward. Indeed, several issues on the official Github page raise the exact same concerns that we highlight here. However, Google has continued developing the system, leaving the fundamentals nearly unchanged. It has started pitching FLoC to advertisers, boasting that FLoC is a “95% effective” replacement for cookie-based targeting. And starting with Chrome 89, released on March 2, it’s deploying the technology for a trial run. A small portion of Chrome users—still likely millions of people—will be (or have been) assigned to test the new technology.

Make no mistake, if Google does follow through on its plan to implement FLoC in Chrome, it will likely give everyone involved “options.” The system will probably be opt-in for the advertisers that will benefit from it, and opt-out for the users who stand to be hurt. Google will surely tout this as a step forward for “transparency and user control,” knowing full well that the vast majority of its users will not understand how FLoC works, and that very few will go out of their way to turn it off. It will pat itself on the back for ushering in a new, private era on the Web, free of the evil third-party cookie—the technology that Google helped extend well past its shelf life, making billions of dollars in the process.

It doesn’t have to be that way. The most important parts of the privacy sandbox, like dropping third-party identifiers and fighting fingerprinting, will genuinely change the Web for the better. Google can choose to dismantle the old scaffolding for surveillance without replacing it with something new and uniquely harmful.

We emphatically reject the future of FLoC. That is not the world we want, nor the one users deserve. Google needs to learn the correct lessons from the era of third-party tracking and design its browser to work for users, not for advertisers.


Tags: Date-2021-03-03, Google, Internet


This page may have a talk page?.


Access and use of this site by any means implies consent to the Agreement of Use. © Hellene Sun