Who is sharing data with Google's FLoC ad algorithm?

Source: https://adalytics.io/blog/google-chrome-floc

The digital marketing, ad tech, and online privacy communities have recently seen a flurry of discussion on changes in the ad targeting landscape.

In 2019, Google, which controls nearly a third of the digital advertising market, announced a new According to Google, this will eliminate many forms of behavioral and demographic ad targeting, thereby paving the way for a “privacy-first future for web advertising.”

However, the Chrome product development team also introduced another technology called the Federated Learning of Cohorts (FLoC), first proposed in 2020. Google explains that FLoC is “a new way for businesses to reach people with relevant content and ads by clustering large groups of people with similar interests. This approach effectively hides individuals ‘in the crowd.’” FLoC is built into the Chrome source code, and clusters users together based on which websites they visit. FLoC can be accessed by a Javascript API on different pages to enable a new form of ad targeting.

There are many on-going discussions as to the details of how FLoC will operate:

Will FLoC enable ad targeting of demographic categories? Can ads be targeted to HIV positive individuals, individuals suffering from drug addiction, or cybersecurity professionals based on which FLoC cohort they are in?
Does FLoC use all of a user's browsing history while using Chrome, or just the page loads from websites that have a relationship with Google (e.g. installing Google Analytics or Publisher Tags)?
How can advertisers independently audit the accuracy of FLoC targeting?

To help address some of these questions, this study manually ran several FLoC experiments. Based on these experiments, it appears that Chrome incorporates all browsing activity into FLoC calculations, even visits to website about digital privacy, national security, or health topics. If a user browses the European Data Protection Supervisor (edps.europa.eu), Irish Data Protection Authority (dataprotection.ie), Immigrations and Custom Enforcement (ice.gov), or US National Security Agency (nsa.gov) websites, all of these page navigations trigger updates in a user's Chrome FLoC ID.

Notably, the study found that the following domains did not opt-out of FLoC, which indicates they may be sharing their website's audiences with Chrome for ad targeting purposes:

European Data Protection Supervisor (edps.europa.eu)
Irish, German, French, and Belgian data authority websites
Trans-Atlantic Privacy Shield framework (privacyshield.gov)
Electronic Frontier Foundation (eff.org)
Mozilla Foundation (foundation.mozilla.org)
International Association of Privacy Professionals (iapp.org)
Secure the News (securethe.news)
UK's National Health Service (nhs.uk)
US Central Intelligence Agency (cia.gov)
US National Security Agency (nsa.gov)
US Federal Trade Commission (ftc.gov and identitytheft.gov)
US Federal Communications Commission (fcc.gov)
US Cybersecurity & Infrastructure Security Agency (cisa.gov)
US Department of Defense Sexual Assault Prevention and Response Office (sapr.mil)
US Cyber Command and US Special Operations Command (cybercom.mil, socom.mil)

Navigating to ‘sensitive’ websites does change your FLoC ID

There has been some discussion about whether or not Chrome will use a user’s browsing history on ‘sensitive’ websites in the FLoC calculation; will visiting websites on health information, government, adult content, or human rights be used to sort a given user into a certain FLoC cluster? Will this apply even if a given website does not run Google Ads or use Google Analytics?

In order to address this question, different ‘sensitive’ websites were visited, and the FLoC ID was monitored.

Google shares some documentation on their web.dev website for how a developer can try out the FLoC algorithm.

This study downloaded the latest Chrome Canary build (as of April 6th, 2021), which was version 91 of Chrome. Then, Chrome Canary was run from the command line, with FLoC features enabled as per Google’s documentation.

Then the Chrome browser’s Developer Tools console was opened, and a short Javascript snippet was used to display the cohort IDs.

The Chrome Canary browser was used to navigate to various different websites. After each navigation, the Javascript snippet was run again to observe whether the FLoC cohort ID had updated in response to the page navigation.

Google Chrome GIF recording, showing how navigation to the European Data Protection Supervisor (edps.europa.eu) website triggers a new FLoC cohort ID to be displayed in the Chrome browser's developer tools console.

Google Chrome GIF recording, showing how navigation to the National Security Agency's website (nsa.gov) triggers a new FLoC cohort ID to be displayed in the Chrome browser's developer tools console.

On the other hand, navigating to The Guardian’s website (theguardian.com) or The Markup’s website (themarkup.org) does not trigger a new FLoC cohort ID, as The Markup has opt-ed out of enabling Chrome to utilize the FLoC API on their website.

99.99% of websites are not opting out of FLoC

After observing the browser’s FLoC IDs updating upon page navigation to ‘sensitive’ websites which do not have any relationship with Google (i.e. they do not display Google Ads nor do they use Google Analytics), the next step was to check how many websites opt-ed out of making their audiences accessible to the FLoC algorithm.

Tweet from privacy researcher Zach Edwards, showing how most website owners are not opting out of FLoC Origin trials in Chrome. In many cases, they are unable to opt-out because they do not control their web server's response header configurations.

The Google FLoC documentation states that a “website can opt out of all FLoC cohort calculation by sending the HTTP response header” Permissions-Policy with a specific flag. This “enables a site to declare that it does not want to be included in the user's list of sites for cohort calculation.”

For example, a site can opt out of all FLoC cohort claculation by sending the HTTP response header:

Permissions-Policy: interest-cohort=()

If the Permissions-Policy header is not configured, the “policy will be allow by default.”

This study analyzed the 100,000 most popular domains according to the Tranco list, which ranks the most popular sites on the internet. Each domain was examined for the presence of a Permission-Policy header. If a domain included such a header, the header contents were checked for the presence of the “interest-cohort=()” flag.

The analysis of 100,000 different domains, found that only 10 websites have chosen to opt-out of FLoC via the Permissions-Policy header (as of April 7th, 2021).

The ten websites that opted-out

AirTable showing which domains are currently opting out of the Chrome FloC API. This was checked by looking at the Permissions-Policy HTTP response header on each website.of being included in the user's list of sites for cohort calculation were:

theguardian.com
duckduckgo.com
guardian.co.uk
brave.com
metafilter.com
bravesoftware.com
basicattentiontoken.org
themarkup.org
nic.ad.jp
gu.com

Adalytics reached out to a number of these websites and inquired as to why they chose to opt-out of FLoC.

Julia Angwin, the Editor-in-Chief of The Markup, wrote a piece on the implications of FLoC for user privacy. She also explained that her team did not want their readers to be targeted with ads by virtue of visiting The Markup’s website.

Tweet from The Markup Editor-in-Chief Julia Angwin, explaining why they chose to opt-out of FLoC's Origin trial via a HTTP response header on their website.

DuckDuckGo wrote on their blog that they encourage other websites to “take steps to protect the privacy of their users by opting out of FLoC, which would be applicable to all their visitors.

Brave also stated that they are “opting out of FLoC on our websites [...] because we think FLoC (along with many other elements of Google’s “Privacy Sandbox” proposal-set) is bad for privacy and site businesses.” They also advised that “all sites should disable FLoC”.

Brave advises that failing to configure the opt-out HTTP response header may have negative ramifications for websites: “Default FLoC behavior will leak and share user behavior on your site, which will harm sites that have high trust, or highly private relationships, with their users.”

Given these commentaries on the implications of FLoC, this analysis observed that many ‘sensitive’ websites did not opt-out of FLoC at the time of writing. These included military, digital privacy, human rights, health information, and national security related websites, such as those listed in the introduction.

Visiting all of these websites in the first part of the analysis was empirically shown to update the FLoC cohort IDs in the Chrome Canary browser.

Conclusion

Caveats & limitations

This study used Chrome Canary (v91) to test the FLoC API’s behavior. This may not be representative of the FLoC algorithm’s behavior in future production releases. The study also intentionally visited a small number of specific websites to observe whether the FLoC IDs update after each page navigation. This certainly is not representative of a typical user’s browsing history and thus may skew the measured FLoC API outputs. Lastly, all experiments were conducted from a number of US IP addresses. The FLoC API may behave differently in other countries, such as EU member states.

The analysis also checked for “Permissions-Policy” HTTP response header values on a single date (April 7th, 2021). Websites may change their response headers at any point in the future.

Discussion

The results of this study and other commentaries raise a number of issues warranting further discussion.

The manual experiments conducted herein show that even if a website has no relationship with Google (no analytics or ads), it still appears to affect a Chrome user’s FLoC ID. This means that individuals who visit sensitive websites, like the European Data Protection Supervisor’s landing page or the National Security Agency’s website could theoretically be targeted with ads based on these behavioral patterns. This is a potential national security risk - a previous Adalytics research study found that the widespread use of unsandboxed ad iframes makes it possible for foreign intelligence services to target Western military and intelligence officials with malware-laced ads. The Electronic Frontier Foundation pointed out that FLoC IDs can theoretically make it easier to fingerprint individual users.

Google has stated that they will "analyze the resulting cohorts for correlations between cohort and sensitive categories". Google uses Natural Language Processing (NLP) to auto-parse page text content, and may be using the same data to determine if the audience of that content is being labeled into a protected category. This means that the underlying performance of Google's NLP technology is a critical factor in excluding 'sensitive' FLoC cohorts. However, research by ethical AI researchers such as Timnit Gebru showed large language models may have many biases and be hard to audit. Relying on text classification as a "double check" to remove sensitive cohorts built via FLoC's user clustering may thus be problematic. What will happen if a cohort of NSA or CIA employees is inadvertently created, and not excluded as "sensitive"?

Secondly, many website developers who wish to protect their user’s privacy and browsing history may not be aware of this new Chrome standard. FLoC is under development and was released in various parts over the last few years. As FLoC is by default opt-in, rather than opt-out, developers would need to be aware of FLoC to know that there is a need to update the Permissions-Policy header. Many websites also do not have control over their HTTP response headers, if they are using certain hosting providers. Adalytics is one such website (though I reached out to my hosting provider to see if they can make this change). It would be ideal if Google designed other opt-out mechanisms, such as a <meta> tag that can be placed in the head of each website’s HTML.

The last consideration is a matter of marketing effectiveness. There are valid reasons for Chrome to want to eliminate third party cookies, following the examples of other browsers. However, if FLoC is designed to be a replacement mechanism for ad targeting, it needs to be secured against ad fraud and manipulation. Google says “FLoC can provide an effective replacement signal for third-party cookies.” Its simulations demonstrate that with cohorts, marketers can expect to see around 95% of the conversions per dollar spent when compared with cookie-based advertising.

Despite Google’s claims of increased conversions, information security researcher Jonathan Foote recently demonstrated that the FLoC API can be “monkey patched” to return whatever FLoC ID a user desires. A user can actually easily inject any arbitrary FLoC cohort ID they desire. If advertisers are paying large sums of money to target CEOs or doctors, an ad fraud criminal could run a series of sophisticated bots that mimic the FLoC cohort IDs of the CEO or physician clusters. This could significantly degrade advertiser’s media investment performance.

Take away points

Google Chrome’s FLoC algorithm appears to update how it categorizes users when they browse to ‘sensitive’ websites which have no digital ads on them
99.99% of websites have not enabled the Permission-Policy response header to opt-out of sharing their readers with FLoC’s cohort calculations
FLoC may be vulnerable to ad fraud and manipulation

Tags: Date-2021-04-06, Google, Internet

This page may have a talk page ?.