You searched for feed - Public Knowledge https://publicknowledge.org/ Tue, 25 Nov 2025 06:54:44 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 https://publicknowledge.org/wp-content/uploads/2022/01/cropped-pk-logo-32x32.png You searched for feed - Public Knowledge https://publicknowledge.org/ 32 32 How Fragmenting Media Can Carve a Path Toward a New Journalism https://publicknowledge.org/how-fragmenting-media-can-carve-a-path-toward-a-new-journalism/ Wed, 12 Nov 2025 20:49:59 +0000 https://publicknowledge.org/?p=38453 The crisis facing journalism does not have to spell disaster; it can mark a turning point in its transformation instead.

The post How Fragmenting Media Can Carve a Path Toward a New Journalism appeared first on Public Knowledge.

]]>
It is often said these days that legacy media may not be long for this world. But this may in fact be good news for journalism. While traditional mass communication is succumbing to a toxic mix of hostility from elected leaders, shady business deals, decreasing ad revenue, and broken trust, the tools for a renaissance of reporting are some of the very things killing the traditional legacy business model.

We must first distinguish between the business of delivering news and the actual matter of reporting it. While the current business models have been dying for a while (at least as far as producing actual news goes), the business model for supporting reporting and delivering news to the public has evolved and changed repeatedly over the last century and a half. Indeed, the very tools accused of killing the business of journalism – social media platforms and artificial intelligence – provide the tools for rebuilding actual journalism with modern, sustainable business models. But this won’t happen on its own. Through a mixture of carefully targeted government action (that avoids content regulation in violation of the First Amendment) and active choices by those who care about finding reliable sources of news, we can turn the predictions of doom into a virtuous cycle.

Concentration and the End of Traditional Journalism

Over the last 30 years, traditional media have consolidated to levels previously unimaginable. This leads to both constant cutting of reporters in the quest for “efficiencies” to pay off the massive debt these companies take on to consolidate to please Wall Street investors. This reduces the production of real news, forcing outlets to recycle the same content over and over. Even worse, with addictive short-form social media, traditional long form media and broadcast are becoming irrelevant. The ongoing attacks from conservative elites have helped to erode public trust in traditional journalism, while consumers increasingly get their news from so-called “truth tellers” that promote conspiracy theories online that algorithms favor. Meanwhile, people searching for information often sift through massive amounts of online content, many unconcerned about distinguishing credible information from deliberate falsehoods. 

Comprehensive, factual, and comparatively measured journalism is already at a disadvantage in the competition for our attention online. The comparatively small amount of high-quality news content is drowned out by a high volume of misinformation, AI slop and other low-quality, more easily produced content that our lizard brains are naturally more responsive to. The problem is compounded by a decades-spanning trend of newspapers being bought up and consolidated by hedge funds and private equity firms, only to have to desperately claw at an ever-shrinking pot of ad money. This financial squeeze has contributed to the disappearance of 40% of local outlets in the last 20 years, meaning fewer community stories are ever reported. 

Meanwhile, so-called “digital natives” are flocking to social media platforms, while newspapers’ and broadcasters’ most active viewers turn gray. This new generation of readers/watchers brings a new set of preferences: Gone are the days of skimming the daily paper along with your morning coffee or gathering around the television for your family’s favorite evening news program; this new audience opts instead to leave what news reaches them, and when, up to the algorithms. In a typical information system, we might have called these people “information seekers,” but the current system is far from typical, and the data tells us modern audiences aren’t doing much seeking at all. Social media has fundamentally changed our news consumption behaviors and will undoubtedly impact the relationship future generations will have with the news.

Unfortunately, the same factors afflicting mainstream media are also eroding an already tenuous trust with the public and have driven them towards other sources of information, namely news influencers, or “newsfluencers” who take the form of bloggers, content creators, podcasters and streamers. Some of these individuals produce original journalistic work, but many merely comment on the news or spread misinformation and conspiracy theories, all while disparaging actual journalists. For better or worse, the internet is not a newsroom and there are no codes of ethics, few fact-checking protocols, no editors-in-chief. With audiences increasingly gravitating toward these newsfluencers over objective reporting, the standards for high-quality journalism slip or evaporate entirely – and the individual competencies necessary to traverse this evolving media landscape are far from universal. But the fragmentation of our media and news landscape isn’t necessarily a bad thing, it may even be the solution journalism needs.

Fragmentation: A new beginning?

You get the picture – the Fourth Estate is going through a tumultuous and at times painful transition. But on the other side of this awkward intermediary stage could be a future where journalists can deliver the news without dependence on ratings or fear of reprisal and an opportunity to create healthier, more sustainable information systems for everyone. This next form also broadens who can act in a journalistic capacity – not just those who have graduated from journalism graduate programs, but anyone with a computer and a penchant for truth-seeking.

The civic information we all rely on should exist in a digital landscape that is open, safe and unencumbered by the interests and whims of billionaires and monopolistic platforms. The “digital infrastructure” that delivers our news shapes the quality of reporting, the public’s media literacy, and the health of our information system as a whole – and the existing structure is flimsy at best. Beyond journalists themselves, the digital future we’re moving toward will require contribution from all stakeholders – lawmakers, platforms, journalists, and information seekers – to create and participate in a healthier information system that benefits us all. To accomplish this, we will need to identify and lower barriers to current and emerging independent journalists while upholding high journalistic standards and protecting the public interest.

Ref, do something!

As the infrastructure needed for online enterprise and access to information, civic information should be treated like a public good rather than the cash cow of just a few companies. It is imperative that policymakers rein in Big Tech before they swallow up any hope of a free, thriving marketplace for information sources.

Any journalist will attest that online platforms like X or Bluesky are crucial for reaching an audience (and even for newsgathering). But a journalist’s content is at the mercy of just a handful of monopolistic platforms with a stranglehold on their access to revenue and readership. By controlling the digital ad market, picking and choosing what content wins the contest for visibility, and diverting viewers from original content with AI overviews and bot crawlers, Big Tech is obstructing information seekers’ access to high-quality sources. 

Existing proposals aimed at addressing this would prevent anticompetitive practices like discrimination and self-preferencing, while also promoting interoperability – giving journalists the ability to migrate to more favorable platforms with their audiences if, say, they are suspended from a platform by a vengeful billionaire for doing their job. In this same vein, a federal data privacy and portability law, like California’s Consumer Privacy Act (CCPA),would codify users’ rights to data privacy and to easily move between platforms of their choosing. Other proposed legislation would target the shady and lopsided advertising market with an ad tech structural separation bill. 

The platform giveth, and the platform taketh away 

In conversations around the harms emerging from the digital landscape, there is a tendency to demonize the algorithms that organize and deliver content rather than focusing on the platforms that engineer them. Algorithms are a tool like any other and they can be designed and deployed without suppressing user speech or invading their digital privacy. Attempts to regulate content or algorithms, the channels for that content, by the government are not only often impractical but also threaten free speech. Barring very specific circumstances involving actual illegal content, content moderation should be left to platforms to figure out. 

Rather than attempting to regulate the algorithms, platforms should integrate evidence-based design that puts the power to assess and avoid misinformation in the hands of information seekers. One popular approach has been crowdsourcing; X, formerly known as Twitter, was an early adopter of user-led fact-checking with a community notes feature that has so far proven to be an effective tool for ensuring information integrity online. A similar method was applied in a study where researchers developed and tested a platform which prompts creators to evaluate their content for accuracy before posting and allows users to share and view accuracy assessments on content, then filter their feeds accordingly. Decentralizing moderation in this way does not guarantee that high-quality news content will always come out on top, but it could empower users to sort through the slop – and it’s certainly a more constructive approach than regulation that threatens free speech and fails to directly address the harms to our information systems. 

Social media platforms generate billions of dollars in ad revenue each year, yet creators see very little profit. Creators, especially journalists working in the public interest and providing superior content, should have pathways for compensation for their work. Many high-profile journalists have set up shop on the blogging platform Substack, which offers fellowships, cash prizes and a creator fund on top of their subscription structure. Monetization tools like these are promising, as long as they remain transparent. But while Substack is a current favorite for journalists and readers, it’s hardly perfect. The drawback here is one found with most platforms: once established on the app, it is difficult to move your data and audience elsewhere. Substack offers an in-app subscriber exporting function, but still limits creators’ migration by differentiating subscribers from mere followers, meaning a significant portion of their readership could be left behind. Clearly, there is still room for improvement with these emerging platforms, and policy that encourages interoperability and user data portability will be critical to online users’ continued access to their preferred news sources.

(News)paper beats rock

For all the challenges facing journalists because of AI, particularly the Sisyphean struggle to compete with generative AI and the ongoing conflict between news publishers and AI developers, there are also a number of ways it could help journalists to protect and profit from their work. Recently, Public Knowledge proposed a policy framework involving voluntary collective licensing, standardized signaling mechanisms, and public-interest safeguards that would strike a balance between fair use and sustainable business practices for publishers. Any negotiations over the use of journalistic works must involve smaller publishers and independent journalists, not just the biggest media companies with the most bargaining power.

Good journalism is grueling work; it’s slow and costly to produce, sees little return on investment and then has to compete with the slop filling our feeds. The same tools that have expanded access to online information can also drown out the most trustworthy sources. But emerging technologies can just as easily bolster news production. To name just one example, some of the largest newsrooms in the United States are already taking advantage of AI in things like data analysis, headline generation, transcribing and translation, or launching chatbots that answer reader questions. While these proprietary technologies are generally more available to large publications than to independent journalists and newsfluencers, smaller news creators are finding ways to make AI work for them.

For journalists with affordability and centralization concerns, open source AI could be a cheaper and (hopefully) more trustworthy alternative. Public Knowledge has championed open AI systems as a vital force for accessibility, innovation and transparency in AI development, but journalists are now finding applications for them, too. With access to the models, datasets and tools provided by developers, journalists can streamline their work and provide feedback to improve or specialize AI tools. While we have considerable reservations about President Trump’s AI Action Plan, it does include a push for embracing open source AI that shows promise. With the weight of the White House behind it, open source AI could soon be a more widely-available tool with the power to revolutionize the newsgathering and producing process.

Don’t eat the slop

The ugly truth of the matter is that online engagement is what ultimately drives algorithms, open protocol platforms, and money away from high-quality journalism. Social media users must take responsibility for their own information diet; just because slop is put in front of you, doesn’t mean you have to eat it. We need better education and training to increase digital literacy and awareness of the journalists creating worthwhile content. Initiatives are expanding concurrently with the rise of generative AI, with the White House assembling a task force aimed at fostering AI literacy and proficiency in education and media literacy mandates in K-12 education cropping up in several states. State action should be supported with federal funding and resources, which has been attempted before, but has yet to be enacted.

As bleak as things may currently seem, digital users are expressing frustration with the slop-ification of their feeds in a slew of viral video essays, think pieces and organized movements. While overall social media usage trends downward, news consumption continues to rise as a primary reason to log on. Clearly, there is still a healthy consumer appetite for high-quality information. And luckily, online users have a wealth of technological tools at their disposal with which to fact check, diversify news consumption, and seek out high-quality content. As long as there are independent journalists working diligently in the public interest, assisted by a digital infrastructure that works for them rather than against, there will always be an audience prepared to wade through a bog of 24k gold Labubus and Sora 2-generated fail compilations to reach them. 

The crisis facing journalism does not have to spell disaster; it can mark a turning point in its transformation instead. To survive the collapsing of the norms and business models we’ve grown accustomed to, all stakeholders in our information systems will have to adapt. By breaking up the monopolistic hold of platforms on our information, empowering rising independent voices, and expanding agency and digital literacy for everyone online, we can equip information providers and seekers with the tools they need to navigate the online world and reimagine a digital infrastructure that is more equitable, transparent, and participatory. This is our opportunity to ensure a sustainable future for news – and our democracy. Let’s not blow it.

The post How Fragmenting Media Can Carve a Path Toward a New Journalism appeared first on Public Knowledge.

]]>
Is There a Middle Ground in the Tug of War Between News Publishers and AI Firms? Part 1: Framing the Problem https://publicknowledge.org/is-there-a-middle-ground-in-the-tug-of-war-between-news-publishers-and-ai-firms-part-1-framing-the-problem/ Mon, 22 Sep 2025 17:13:08 +0000 https://publicknowledge.org/?p=38306 The tug of war between online news publishers and AI developers on the role of copyrighted content in AI model training may lead to a more closed internet for everyone.

The post Is There a Middle Ground in the Tug of War Between News Publishers and AI Firms? Part 1: Framing the Problem appeared first on Public Knowledge.

]]>
The tug of war between online news publishers and AI developers on the role of copyrighted content in AI model training may lead to a more closed internet for everyone. In this two-part blog series, we describe the situation as it’s unfolding and propose policy solutions worth exploring to preserve incentives for publishers to keep creating the timely content that AI developers need and democracy requires. View the second post, “Is There a Middle Ground in the Tug of War Between News Publishers and AI Firms? Part 2: Framing Solutions” to continue the series.

In July, the Senate Judiciary Subcommittee on Crime and Counterterrorism hosted a hearing with the provocative title, “Too Big to Prosecute? Examining the AI Industry’s Mass Ingestion of Copyrighted Works for AI Training.” Happily, some of the conversation with the four witnesses was actually about the crime of online piracy, meaning how artificial intelligence, or AI, firms have downloaded content from shadow libraries (unofficial pirate repositories of digital books and articles) to train their large language models (LLMs). However, the majority of the hearing served only to conflate “piracy of copyrighted works” with “training generative AI models on copyrighted works.” The former may be illegal, but we believe the latter is generally a protected fair use under the legal doctrine that allows limited use of copyrighted works without permission. 

The hearing was a reflection of the tremendous public attention on the intersection of generative artificial intelligence, copyright, and online publishing. Although the hearing largely focused on book publishers, many of the issues discussed also pertain to another type of publisher Public Knowledge has written about quite a bit: online news publishers. They, too, often use the language of “stealing,” “taking,” or “hoovering” to describe what happens to their copyrighted content by digital platforms and believe it to be a type of theft. We find the argument that digital platforms “steal” news content misconstrues copyright law and conflates two very different ideas: piracy, and AI model training. 

In our view, web crawling for the purpose of AI training implicates the freedom to learn, freely use information, and freely express creativity. We also acknowledge the genuine economic risks generative AI poses to creators, including news publishers. And we know that some aspects of AI model design may infringe on content owners’ rights to control the reproduction, distribution, and public display of their work. For example, overfitting, including memorization, by models may result in infringing outputs. However, when we apply existing copyright law to our best understanding of generative AI systems, we find that their core elements are consistent with the law.

In Part 1 of this two-part blog series, we discuss why news publishers fear the impact of AI on their business models and whether these concerns are warranted. In Part 2, we describe strategies publishers are already using to mitigate the impact of AI on their business models, list additional solutions that are emerging to empower them against the threat generative AI represents, and identify policy solutions worthy of additional exploration to preserve the benefits of fair use while preserving incentives for publishers to keep creating the content AI developers and the public need.

AI Developers and News Publishers Are… Frenemies?

Whether they acknowledge it or not, AI developers and news publishers are increasingly codependent. Online news publishers play a crucial role in AI training by providing large datasets of high-quality, human-generated content. Whether through model training or retrieval augmented generation (more on these later), journalism grounds AI models in reality and in the now. The currency and relevance of AI generated outputs depend on access to timely sources of content. Journalism provides factual reporting, context, and in-depth analysis of real-world events and issues happening in the moment. By training AI on diverse journalistic sources, models learn to recognize and mitigate biases present in their training data. And using journalism for model training can help improve fact-checking and combat propaganda and false information. That may be why studies show the training sets underlying LLMs “significantly overweight publisher content” compared to the generic collection of content scraped by Common Crawl. The result: If their outputs undermine all the viable business incentives and models that sustain online news publishing, AI developers will eventually be in a world of hurt.

Conversely, journalism organizations need to understand and, where appropriate, leverage AI systems to adapt to the realities of a digital media landscape. With the rise of the internet, then search and social media, and now AI-mediated search and information distribution, news publishers have been forced to grapple with rapidly changing technology that disrupts their business models. Additionally, publishers are always looking for opportunities to reduce costs, streamline news gathering, facilitate translation, and engage customers, and are therefore aggressively seeking ways to leverage the substantial benefits of AI in their own operations. (This is not without controversy, including as to whether AI tools must be subject to the same journalistic editorial standards that pertain to human journalists. There is also evidence that the vast majority of businesses – 95% – have yet to see real efficiencies from AI materialize.) Lastly, journalists will need to leverage AI to conduct forensics and ensure the legitimacy of images and videos that may have been created with AI tools. 

Finally, AI developers and publishers both have self-benefiting roles to play in maintaining the open and free nature of the internet. Certainly government-funded research, commercial innovation, public policy, and infrastructure have been critical to create the internet as we know it today. But a lot of content and services (like search) are accessible and free to internet users today because of appealing publisher content and the advertising that supports it. Ensuring the viability of online publishing – whether it takes the form of newspapers, personal blogs, Substacks, or other models – allows a wider range of content and services to remain accessible and free. At the same time, news publishers have relied on users (and platform algorithms) freely sharing and promoting their content to drive clicks, views, and advertising revenues. Allowing AI models to “read” and learn from online content is an essential aspect of an Open Internet, and it may create substantial economic and societal benefits. All of this is built on the premise of a free and open web. 

Lastly, journalism shares an interest in permissive copyright rules and strong fair use protections. Journalists are themselves highly dependent on the legal doctrine of fair use – for criticism and commentary, news gathering and reporting, republishing source material, illustration, historical reference, and documenting claims. Hollowing out fair use or dramatically expanding intellectual property rights could whip around and harm journalism itself.

Given these relationships, in our view policy solutions must be developed to ensure that the benefits of generative AI are shared by the body politic writ large without undercutting the journalism necessary for democracy’s survival.

Why Publishers Fear Generative AI 

News publishers have long believed that dominant digital platforms unfairly – and in some cases, illegally – exploit their work. The platforms’ aim, this theory goes, is to garner most or all of the joint value created through their longstanding exchange with publishers: user engagement from news content on search and social media platforms in exchange for referral traffic provided to publishers through links. The current challenges in the news industry predate the internet, but there is no debate that digital disintermediation has dramatically impacted the structure and economics of news delivery. This has led publishers, in some cases, to pursue solutions (like link taxes) that are incompatible with copyright law as well as the principle of an Open Internet.

Now AI, especially generative AI and its embedment in search products, chatbots, and agents, has exacerbated news publishers’ concerns about the devastation platforms have caused to their business models. For example, generative AI’s ability to provide complete narrative answers to some of the most complex user search queries right on the search engine results page undermines the need to click through to online publishers for more information. (When Google rolled out AI Overviews, now AI Mode, in May of 2024, the company actually explicitly promised to users that AI overviews are the perfect solution when “you don’t have time to piece together all the information you need.” In other words, a zero-click search is the product benefit of AI Overviews.) Or, chatbots and agents trained on news publishers’ copyrighted content answer user queries about current events instead of search engines. That means the flow of traffic, ad dollars, and profit could continue to shift toward the dominant AI companies and away from publishers, spiking the trend line in place for decades. 

This challenge isn’t just about model training. AI models are now often complemented by grounding processes, by which AI models are connected to real-time information to improve the accuracy and relevance of their outputs. One example of a grounding process is retrieval augmented generation (RAG), a technique that accesses web pages as part of a query to improve the accuracy and currency of the AI model’s outputs. These kinds of enhancements require access to current information – like today’s news – to validate or update responses that would otherwise be based on prior generations of training data. While some of these responses include citations or links to the source, publishers believe these technologies will result in (even) less traffic, (even) fewer ad dollars, and (even) fewer subscription conversions. Publishers also highlight the risk of brand erosion due to AI slop, hallucinations, and misattribution (or lack of attribution) to the right news sources. 

This line of thinking doesn’t even account for the likely adoption of an advertising-based business model for AI products. That would mean even more ad dollars migrating from publishers to AI firms. Google is already selling ads within AI Overviews. Other AI firms are likely to adopt ad-based business models, as well, for two reasons: it’s the business model the dominant platforms already rely on, and newer AI companies have had little success in attracting paid users. (Publishers’ concerns also do not factor in Google’s brand-new “Preferred Sources” feature, which lets users “select their favorite sources” to be placed most prominently in search results “when those sources have published fresh and relevant content for your search.” Preferred Sources may serve to further marginalize small and diverse news sources, as users are generally more familiar with major, national media brands.)

Lastly, news publishers, like many others, are concerned about false information from chatbots and the impact that “pre-digested verdicts” to important queries, shaped by opaque algorithms and advertising- and engagement-based financial incentives, will have on our overall information environment. 

Early Impact of Generative AI on News Publishers

Publishers aren’t crazy – they’re already reporting the damage generative AI and its offspring are causing to their cost structure and revenue. (Yes, it’s fair to say these trends are getting media coverage in part because publishers are trying to make the case for protective legislation. But without data from AI firms to refute it, this is the story legislators are hearing and acting on. More on that later.)

Alarmed by Declines in Traffic

The emergence of new AI tools (like OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, and Perplexity) has resulted in more search referrals to many publishers. However, the increase in referrals is not compensating for a higher rate of zero-click searches derived from Google’s AI-powered search overviews. (Search engine optimization or SEO agencies working on behalf of advertisers are obsessing over precisely which keywords, industries, and geographies are more likely to trigger an AI overview. The consensus seems to be that they currently show up in ~20% of searches.) As predicted, links from AI-powered search overviews have plummeted relative to traditional search queries (which have also become unpredictable). Consumer search behavior also seems to be changing: New consumer research shows that Google users who encounter an AI summary are 50% less likely to click on links to other websites than users who see a traditional search result. Why click on blue links if everything you need to know appears upon your query? Google users who encountered an AI summary also rarely – 1% of the time – click on a link in the summary itself. And Google users are more likely to end their browsing session entirely after visiting a search page with an AI summary than on pages without a summary. New data from Digital Content Next’s membership of 19 digital publishers shows median year-over-year referral traffic from Google Search down 10% for the most recent eight-week period. News brands, which may still be able to cover breaking news in ways AI cannot, fell 7%. (Early in August, Google maintained that “total organic click volume” is stable, but goes on to emphasize other measures such as “click quality,” the presence of more links on the search engine results page, and the shift in traffic to different kinds of content. The company’s post generated spirited feedback from publishers and SEO experts in the U.S. and U.K.)

Overwhelmed by New Traffic from Crawlers and Bots

Well upstream from traffic and ad revenue, publishers are taking on new costs as AI training data crawlers and bots overwhelm their systems. 

For example: TollBit was one of the first platforms to enable websites to monetize their content by charging AI companies and bots for access, so it has some of the most extensive history regarding AI crawler behavior. In its most recent quarterly “State of the Bots” report, TollBit reported that total AI user agent traffic among the TollBit customer network grew 87% from the last quarter of 2024 to the first quarter of 2025. This is likely due to higher rates of adoption of these tools among users. Within this total, for the first time, traffic from retrieval augmented generation bots exceeded traffic from training bots, growing at nearly 2.5x the rate of training bot traffic. If this trend continues, it will mean that supporting AI user agent traffic will be an ongoing and increasing cost for publishers as adoption grows. And as noted above, TollBit found that referral traffic from AI bots was still minuscule – just 0.04% of all external referrals to network sites in Q1 2025 – and nowhere near enough to offset the broader decline in traffic from traditional search sites. 

Frustrated by Lack of Control Over Access

Publishers, faced with heavy scraping loads from AI firms but seeing little return in monetizable traffic, are increasingly pushing to assert greater control over how their content is used for training and real-time AI queries. This can be technically complex. For example, some of the largest AI products, like Google AI Overviews, Microsoft Copilot (Blogbot), and Apple’s AI tools (Applebot), do not separate their AI user agents from their search ranking crawlers. Publishers risk losing all their visibility to platform users if they try to manage or block these firms from accessing their content. Publishers see the need to control how their content is used for AI training as a way to counter these technology companies’ monopolistic power. But blocking search ranking crawlers can be business suicide. 

Other AI firms simply ignore the robots’ exclusion protocol, robots.txt, that publishers use to notify technology platforms that they do not wish to have their content crawled. TollBit’s network data, for example, suggests that disallowing real-time scraping by retrieval augmentation bots via robots.txt has zero impact on the referrals the AI apps deliver – they’re still crawling. AI firms may also be using third-party scrapers, stealth scrapers, or masked user agents that continue to scrape sites despite the exclusion protocol. They may also pull cached content from search engines or scrape it from the Internet Archive. This has resulted in online publishers blocking the Internet Archive to avoid their content being scrapable from the Wayback Machine. This means both publishers and AI firms – as well as internet users in general – lose important pieces of digital history. 

In Part 2, we describe strategies publishers are using to respond; inventory solutions that are emerging to empower them against the threat generative AI represents for their business models; and identify some promising policy solutions that preserve the benefits of the fair use doctrine while still preserving incentives for publishers to keep creating the timely content that AI developers need for their business model – and citizens need to stay informed.

The post Is There a Middle Ground in the Tug of War Between News Publishers and AI Firms? Part 1: Framing the Problem appeared first on Public Knowledge.

]]>
Piracy vs. Fair Use: How AI Training Intersects With Copyright Law https://publicknowledge.org/piracy-vs-fair-use-how-ai-training-intersects-with-copyright-law/ Tue, 16 Sep 2025 18:55:38 +0000 https://publicknowledge.org/?p=38292 Words matter.

The post Piracy vs. Fair Use: How AI Training Intersects With Copyright Law appeared first on Public Knowledge.

]]>
As tools like ChatGPT, image generators, and other AI systems rapidly enter the mainstream, they’ve also ignited heated debates about copyright, fair use, and the future of creativity. Central to the conversation is a question that keeps resurfacing: is training AI on copyrighted works the same thing as piracy?

Some policymakers want you to believe the answer is yes. To them, when an AI system ingests text, images, music, or code from copyrighted sources, it is no different from downloading a pirated movie from an illegal torrent site. But legally and practically, that comparison doesn’t hold up. While both involve copyrighted works in some way, they fall into entirely separate categories of use under copyright law. Understanding the difference matters, not just for copyright lawyers, but for anyone who cares about creativity, innovation, and how we set rules for emerging technologies.

Piracy: Clear-Cut Violations

First, let’s define piracy in the copyright context. Piracy is not a legal term, but simply refers to obvious instances of copyright infringement: the unauthorized reproduction, distribution, or public performance of copyrighted works without permission from the rights holder. That could mean selling bootleg DVDs on a street corner, running (but not using!) an illegal streaming website, or downloading the latest hit album from a peer-to-peer service without paying for it.

Colloquially, the key features of piracy are pretty straightforward:

  • Wholesale unauthorized copying or distribution – The pirate makes complete, exact or near-exact copies of a copyrighted work without permission.
  • Market substitution – The pirate’s actions provide consumers with the copyrighted work in a way that directly competes with legitimate sales or licenses.
  • Commercial or personal use – Piracy can happen whether someone sells bootleg copies for profit or downloads them for free, but the goal is getting to enjoy or use the work.
  • No obvious fair use defense – The use of the work isn’t transformative, copies the entire work, and has no clear publicly beneficial use like education.

Piracy is illegal because it infringes the exclusive rights of copyright holders to reproduce, distribute, or publicly perform their work. It is especially unambiguous in instances of commercial piracy, or where individuals violate copyright to avoid paying for commercially available works for personal enjoyment. If you burn 100 copies of a movie and sell them on the street, that’s piracy, plain and simple. If you torrent the latest Marvel movie to avoid buying a Disney+ subscription, that’s piracy, plain and simple.

AI Training: A Transformative Use

Now compare piracy (which, again, always involves making unauthorized exact or near-exact copies) to how AI systems are trained. Training a large language model (LLM) or image generator involves feeding vast amounts of data such as text, images, audio, or video into a machine learning system. The system processes these works to detect patterns, relationships, and statistical structures in the data.

The critical point: AI models do not store or distribute the copyrighted works they are trained on in the way pirates do.

Instead, the training process works more like this:

  • The system ingests the training data and converts it into mathematical representations (vectors, weights, and parameters).
  • The original works are not preserved in the final model. Instead, the system captures statistical information about how words follow one another, how images are composed, or how sounds combine into melodies.
  • Once training is complete, the model generates new outputs based on probabilities, not by reproducing specific works or parts of them.

This training process almost always requires making temporary copies of copyrighted works during the process. But the end product is not a copy of those works; it is a statistical model capable of generating new, original content.

This distinction matters enormously under copyright law. Copying for training is transformative: it uses the works for a fundamentally different purpose from the original, much like indexing websites for search engines or scanning books for text analysis.

Fair Use

The legal doctrine that governs this distinction is fair use, a cornerstone of U.S. copyright law. Fair use allows certain unlicensed uses of copyrighted works when those uses are socially beneficial, transformative, and do not undermine the market for the original.

Courts weigh four factors when assessing fair use:

  1. Purpose and character of the use – Is the use transformative, and does it add new meaning or purpose?
  2. Nature of the copyrighted work – Is the work factual or creative?
  3. Amount and substantiality used – How much of the work is used, and is it reasonable for the purpose?
  4. Market effect – Does the use substitute for or harm the market for the original work?

AI training clearly checks many of these boxes in favor of fair use. The purpose of an AI developer making copies of copyrighted works during training is not to enjoy the expressive value of the work: it is to extract information about the composition of language, images, or sound or the relationships between ideas. The works are transformed into statistical data, not consumed as creative expressions. And training does not typically compete with the market for the original works, since people don’t use an AI system as a substitute for buying the latest novel or film.

This makes AI training much more like recognized fair uses in past cases:

  • Authors Guild v. Google (2015) – Google scanned millions of books to make them searchable. Courts ruled this was fair use because it was transformative and did not replace the market for the scanned books.
  • Kelly v. Arriba (2003) – A search engine’s use of copyrighted images as thumbnails was found to be fair use because it served a different function from the original artworks. Again, the use was transformative: to make content findable, not to substitute for it.
  • Sony v. Connectix (2003) – Intermediate copying of protected code during reverse-engineering to build a non-infringing emulator was fair use. Here, making a copy of copyrighted material just to learn from a work’s unprotectable aspects can be transformative.

Two recent court decisions, Bartz v. Anthropic and Kadrey v. Meta, affirmed that AI training is a transformative fair use of copyrighted content. They also describe how AI training differs from piracy as in both cases, AI companies downloaded pirated books to train their AI systems. Both decisions found that the use of pirated materials for AI training did not negate fair use. However, the AI companies were potentially liable for copyright infringement for engaging in book piracy by acquiring and keeping the pirated books that they could have procured through legal means.

Addressing Common Counterarguments

“AI Outputs Can Resemble the Training Data.”

Critics worry that AI can sometimes reproduce near-verbatim excerpts of training material. While this can happen through substantial effort, it’s likely easily addressed by traditional copyright law. An infringing output is just that – infringing. Courts don’t need to get into the guts of training or make evaluations about models in their entirety in order to apply basic infringement analysis to specific instances of infringement.

“Creators Deserve Compensation.”

Some argue that even if AI training is fair use, it feels unfair for creators not to be paid when their works are used as data. This is a legitimate (and ongoing) policy discussion, but it’s distinct from the legal question of piracy. The law allows fair use even when it doesn’t involve licensing or payment – because copyright has always been balanced against the public interest in innovation and free expression. Copyright is not an absolute right to control every use of a work, and it is important to preserve distinctions between piracy as obvious violations of existing rights compared to the complexities raised by uses like AI training. 

“They Should Just License the Works.”

Some argue that AI companies should simply license copyrighted works for training, but this overlooks how copyright law has long acknowledged (and even protected) unlicensed uses that serve the public good. Just as search engines don’t need permission to index the web and researchers can mine data without licensing every journal, AI training is transformative, non-substitutive, and provides broad benefits. A voluntary direct licensing market is already developing between AI firms and publishers. But forcing all developers to license potentially billions of works across creative sectors would be unworkable as the only pathway to AI development, and could give a few Big Tech companies outsized gatekeeping power, stifling new and transformative technology. Fair use exists to ensure copyright doesn’t become an absolute veto over socially valuable uses of information.

Conclusion

Words matter. Calling AI training “piracy” or “stealing” or “theft” may be rhetorically powerful, but it is legally inaccurate and dangerously misleading. Piracy is an obvious kind of copyright infringement, where there is clear unauthorized copying and distribution that substitutes for original works and without any kind of higher purpose than making a buck off the unaltered work of others. AI training, by contrast, is a transformative process that uses works to extract statistical information without substituting for them and resulting in a completely new piece of technology as the end product. 

Copyright law, through fair use, has long recognized the importance of allowing transformative uses that enrich the shared commons of knowledge, creativity, and technology. Search engines, digital libraries, scientific research, fan fiction, and even YouTube reaction videos all rely on the principle that not every use requires permission and our society has benefited immensely from it. Whether its social value is more at the scale of tons of reaction videos or a dazzling scientific breakthrough, remains to be seen, but AI training certainly belongs in the tradition of permissible transformative uses.

The real debates about AI and copyright – how to ensure transparency, accountability, and fair compensation for artists – are worth having. But we can’t have them productively if we start with a flawed premise. Training AI is not the same thing as piracy. Understanding that distinction is the first step toward building a copyright system that both protects creators and enables innovation.

The post Piracy vs. Fair Use: How AI Training Intersects With Copyright Law appeared first on Public Knowledge.

]]>
The Censorship Alarm Is Ringing in the Wrong Direction https://publicknowledge.org/the-censorship-alarm-is-ringing-in-the-wrong-direction/ Mon, 15 Sep 2025 15:24:44 +0000 https://publicknowledge.org/?p=38288 Instead of trying to influence laws across the Atlantic, Congress would serve American speech rights better by tackling the real censorship happening at home.

The post The Censorship Alarm Is Ringing in the Wrong Direction appeared first on Public Knowledge.

]]>
On September 3, Congress held a hearing with an alarming title: “Europe’s Threat to American Speech and Innovation.” The premise was that the European Union’s (EU) Digital Services Act (DSA) and the United Kingdom’s Online Safety Act (OSA) poses an existential threat to American free expression. Yet the evidence presented reveals a different story entirely – one where speech safeguards abroad are stronger than the actual speech threats advancing here at home, and where Congress is sounding alarms about unsubstantiated European censorship while ignoring real threats to the First Amendment in their own backyard.

The hearing followed the release of the House Judiciary Committee Republicans’ interim staff report entitled “The Foreign Censorship Threat: How The European Union’s Digital Services Act Compels Global Censorship And Infringes On American Free Speech” (hereafter, “the HJC report”). While Congress certainly has the authority to investigate how foreign regulations might affect American rights and companies, this report is riddled with conjecture, mischaracterizations, and inflammatory rhetoric found in similar “censorship cartel” materials that are readily debunked (and we have, here, here, and here). House Judiciary Democrats also did their own debunking of the Republicans’ “Misleading Report on the EU’s Digital Services Act.”

Ironically, a handful of bills introduced in Congress with bipartisan backing reflect some elements of the DSA – including requirements for transparency in content moderation decisions and redress in case users feel their content has been mistakenly moderated. That’s not to say the DSA and OSA are perfect laws, but framing these laws as “censorship” misrepresents their intentional design as a balance between free expression and online safety – a balance we are slow to figure out here in the U.S. 

Clarifying the DSA’s “Red Line” Against Censorship

What both the HJC report and Republicans in the hearing failed to understand is that the DSA contains what European legal scholars refer to as a “red line,” preventing the kind of arbitrary censorship the HJC report claims. While the EU does not have an American-style First Amendment, it does have a Charter of Fundamental Rights that protects free expression. The EU also has the European Convention on Human Rights, an older treaty which contains similar protection and applies to the individual member countries. As a result, EU regulators cannot restrict speech unless the law clearly specifies what regulators can and cannot do. In other words, the DSA cannot authorize content-specific restrictions using broad terms that lend themselves to abuse; restricted speech must be explicitly spelled out. And the EU can only act within powers specifically granted by member countries. It cannot claim authority over speech that member states have not delegated to it. So an individual EU commissioner cannot unilaterally decide a platform must deal with content in a certain manner. 

As more than 30 leading digital rights scholars recently explained in a letter to Judiciary Committee Chairman Representative Jim Jordan (R-Ohio), such principles create multiple independent grounds for European courts to strike down any attempt to use the DSA for viewpoint-based censorship. The DSA must be “content-agnostic” – meaning that for lawful content, regulators can only enforce content-neutral measures, like how platforms design their systems or how they empower users to control their own experience.

In fact, the HJC report points to an attempt by EU officials to overstep the DSA’s authority. However, rather than proving an incident of unilateral censorship power, the circumstance demonstrates how checks and balances work when officials overstep their bounds. In the lead-up to the 2024 US presidential election, Commissioner Thierry Breton threatened Elon Musk with DSA action for hosting an interview with President Trump, claiming it could incite violence, hate, and racism. He asserted broad authority to regulate “harmful content” and “amplification,” but those terms are not included in the DSA. He confused lawful but controversial speech with illegal content. Yet, European institutions have safeguards: Breton was condemned by civil society groups, his colleagues distanced themselves, and within two weeks, he resigned to avoid dismissal. 

The Brussels Effect and Localized Compliance

It is true that European Union regulations on tech companies can have a global impact. Known as the “Brussels Effect,” the EU enjoys a large and wealthy consumer market, characterized by strong regulatory institutions. If a non-EU company wants to access such a large and wealthy consumer market, it must comply with EU rules. And those rules often influence corporate behavior beyond the EU’s boundaries. An example that has personally benefited me here in the U.S.: thanks to the EU-mandated standard of USB Type-C ports, consumers everywhere no longer need to buy new charging cables and adapters with each new Apple product.  

The House interim report attempts to apply the Brussels Effect phenomenon to the DSA, predicting that online content originating from the U.S. would be moderated according to EU standards, thereby “censoring” the American user if said users run afoul of EU hate speech laws, for example. This prediction isn’t grounded in legal reality or current practice. The European Commission has explicitly clarified that “where content is illegal only in a given Member State, as a general rule it should only be removed in the territory where it is illegal.” The EU’s highest court backed this principle in Google v. CNIL (2019), ruling that EU privacy regulations didn’t require Google to block search results worldwide, only within the EU.

There is nothing in the DSA that requires platforms to moderate content that users in America can access. Reiterating this point, Henna Virkkunen, the EU’s Executive Vice President for Tech Sovereignty, Security, and Democracy, clarified in a letter to House Judiciary Chairman Jim Jordan that the DSA is “the sovereign legislation of the European Union, adopted with overwhelming majorities” and “applies exclusively with the European Union to all services provided therein, irrespective of the location of the provider’s headquarters.”

A Briefer on the DSA’s Requirements

The DSA does not require platforms to remove content outright. Instead, it mandates Very Large Online Platforms (VLOPs) to have an accessible reporting system for users to flag suspected violative content. When notices are received, VLOPs (platforms with more than 45 million users in the European Union) must promptly assess whether the content is illegal (Article 16). They are also required to regularly evaluate systemic risks, including the spread of illegal content, and implement proportionate mitigation strategies, such as their own content moderation processes. VLOPs should prioritize notices from “trusted flaggers” – vetted third-party experts in identifying illegal content – without delay (Article 22). When content is removed, users must be clearly informed of the reasons for the action, the legal basis, whether automation was involved, and how they can seek redress (Article 17). In emergency situations affecting public safety or health, the Commission may instruct VLOPs to undertake urgent measures, including enhanced content removal procedures (Article 36). 

The system emphasizes due process, requiring VLOPs to strike a balance between the effective removal of illegal content and the protection of fundamental rights, particularly freedom of expression. Users have various redress mechanisms if they believe content was wrongfully removed, including internal complaint systems and out-of-court dispute settlement – a redress system similar to one outlined in the Internet PACT Act, supported by Senators John Thune (R-SD) and Bill Cassidy (R-LA), among other lawmakers. In fact, as Public Knowledge noted, this redress system is one that could better facilitate free expression, giving users more agency in challenging content moderation decisions. Moreover, there are aspects of the DSA that would receive bipartisan backing if introduced in the U.S., including greater user agency over how platforms collect and utilize personal data, as well as the use of algorithms to target customers with advertising. In fact, COPPA 2.0 – a bill cosponsored by Senator Chuck Grassley (R-Iowa) – would make it unlawful for platforms to target users under 18 with advertising using personal data. 

Addressing Misconceptions in the House Report and the Hearing

In both the HJC report and Rep. Jordan’s remarks, it was stated that “even the New York Times” pointed out that the DSA addresses online speech in a way that would be “off limits in the United States” due to the First Amendment. Such framing misses the point. The U.S. and Europe have categories of speech with limited or no protection based on their respective histories. Neither tradition is “more democratic” or “more censorial” than the other. 

America’s First Amendment was born from a revolution against colonial authorities that restricted assembly, censored publications, and punished dissent. Europe’s approach reflects different lessons. In the wake of fascism, genocide, and mass propaganda campaigns that dehumanized entire groups, European societies became more willing to regulate hate speech to protect vulnerable communities’ ability to participate in public life. This is notably the case for Holocaust denial – a belief not uncommonly found on “free speech” platforms like X here in the U.S., which would be a criminal offence in many European countries. For EU regulators, dignity and equal participation are co-equal democratic values, meaning that persistent harassment directed at marginalized groups is understood as a threat to their free expression. By contrast, the U.S.’s First Amendment law gives the highest protection to political speech, lesser protection to speech that is ‘purely commercial,’ and no protection to obscenity or other forms of speech deemed harmful under the common law, such as defamation or fraud.

This difference is not about one side embracing “free speech” and the other rejecting it. It is about where each system draws the line between individual expression and collective harm. The U.S. system treats content-based restrictions as likely to violate the First Amendment, but allows varying degrees of “content neutral” restrictions based on a complicated balancing of what type of speech is regulated (e.g., commercial speech), the purpose of the content neutral restriction (for example, disclosing side effects of medications) and whether the regulation restricts more speech than necessary to achieve the purpose. The EU, on the other hand, based on its history, views certain categories of hateful speech as corrosive to democracy itself.

Public Knowledge’s view is that there are lessons to be learned from this for U.S. policymakers. Allowing platforms to become channels for sustained harassment does not create a true marketplace of ideas. It drives targeted voices offline, chilling their ability to speak. Gamergate is a high-profile example where women in the gaming industry here in the U.S. faced sustained harassment campaigns online, including coordinated abuse, that pushed these women off platforms and silenced their voices. The DSA’s provisions that platforms must respond to illegal hate speech and harassment are not simply censorship, but a recognition that unrestricted hate speech can constitute harassment, and that addressing it is one way of preserving a broader range of voices online. By contrast, the U.S. punishes speech designed to harass individuals (such as personal threats) after the fact rather than attempting to prevent it in the first place.

This nuance often gets lost in political rhetoric. In a May 2025 multi-stakeholder workshop hosted by the European Commission, participants from government, civil society, academia, and industry explored various scenarios to determine whether a flagged post qualifies as illegal hate speech. Both the HJC report and Jim Jordan, during the September 3rd hearing, referred to a hypothetical involving Amira, a “16-year-old Muslim girl.” She sees a post from @Patriot90 featuring a meme of a woman in a hijab with the caption ‘terrorist in disguise,’ accompanied by a comment saying, “We need to take back our country.” The report and Rep. Jordan object to labeling “we need to take back our country” as hate speech, deeming it “common political rhetoric.” However, the report and Rep. Jordan leave out additional context: that “the posts from @Patriot90 start to be more frequent and directed specifically at Amira.” The harm comes not from one slogan, but from the cumulative targeting of a young person based on her faith. In that context, ignoring harassment serves the powerful while silencing the marginalized. This is similar to the U.S. criminalizing harassing someone by telephone – except that rather than prevent the harassment, we punish the harasser. 

Clarifying DSA’s Requirements for Elections Monitoring and Fact Checking

The HJC report points to how the European Commission has “initiated formal proceedings against Meta for the ‘non-availability of an effective third-party real-time civic discourse and election-monitoring tool’,” and describes the move as punishing Meta “for failure to adequately censor election-related content.” However, this claim fundamentally misinterprets provisions of the DSA. Firstly, the Commission launched “formal proceedings” because Meta failed to provide a third-party tool for monitoring election-related content, as required by the DSA, after the tech company decommissioned CrowdTangle, a tool used for real-time monitoring of online content. Secondly, the DSA doesn’t specifically define an “election monitoring tool,” but it does require VLOPs to address systemic risks related to electoral processes while protecting freedom of expression. It never states that such a tool be used to flag and remove content; instead, platforms are expected to allow third-party access for monitoring election-related content and ensure they follow their own content policies. The regulation aims to increase transparency in how platforms evaluate election-related risks. 

Given the influence of foreign actors, particularly Russia, it’s understandable that the EU wants to protect its democratic processes, just as the U.S. does. In fact, recently, Republican members of the House Oversight Committee expressed concern in a letter to the Wikipedia Foundation about whether the platform is effectively tracking and addressing foreign interference, including content from pro-Kremlin sources. If our own government is asking tech platforms to assess foreign influence operations aimed at manipulation, why should we criticize our EU counterparts for doing the same? 

Similarly, the HJC report criticizes the Commission for opening “formal proceedings against X for choosing to use Community Notes rather than allow third-party fact-checkers to censor content.” For one, the Commission investigated X to ensure the then-new Community Notes system was effective in addressing illegal content and to verify its compliance with the DSA’s requirements. Further, fact checkers do not censor content. Fact checkers add context to content, essentially expanding speech. The difference between a community note and a fact check is that community notes use a bridging algorithm to add context to posts that are provided and agreed upon by a representative sample of users with different political views. Fact checkers are usually third-party services, often from traditional media, dedicated organizations, or academia, that identify and flag content that cannot be verified. They do not remove or downrank content (although platforms can voluntarily decide to moderate content based on a fact check or content flag). Fact-checking and community notes can work together to provide helpful clarification, especially during election season, when online grifters exploit inflammatory and false content to boost engagement and when foreign adversaries increase their influence operations to flood feeds with false information and propaganda. 

The Censorship Call is Coming From Inside the House 

Nothing in the DSA requirements compels platforms to globalize their content moderation policies to comply with the DSA. Platforms can and do apply content policies based on geographic location, in accordance with local laws. European regulators cannot fine platforms for failing to moderate content for users based in the U.S. However, it can request that platforms moderate a U.S. user’s content that is presented in Europe. It’s not like Americans don’t fret about foreign speech being spread in the U.S. It’s a contributing factor to why the TikTok ban was passed with such bipartisan support – over a panic that the Chinese Communist Party has undue influence over how content is presented to American users. 

Ironically, some of the real speech restrictions the U.K. and EU are implementing have found bipartisan purchase here in the U.S. Namely, the U.K. Online Safety Act (OSA) began requiring platforms to verify ages of users in order for those users to access online content. As we wrote in August, the rollout of OSA has been far from perfect, with platforms blocking access to broad swaths of content that, if you squint, may be inappropriate for some kids – but inevitably blocks adults from accessing content unless they submit privacy-invasive information to confirm their age. Such age verification mandates are finding purchase here in the U.S. too. Just recently, the U.S. Supreme Court gave the green light to a Texas law requiring age verification to access pornography, and declined to block a Mississippi law requiring strict age verification to use social media at all (although Justice Kavanaugh wrote a concurrence asserting that the law itself is “likely unconstitutional”). 

If First Amendment rights are genuinely a top priority for Republican members of the House Judiciary Committee, they should focus on the numerous efforts from the Trump administration that suppress free speech. For example, Ranking Member Representative Jamie Raskin (D-MD), in his opening statement, highlighted President Trump’s frivolous and excessive lawsuits against disliked media outlets, the withdrawal of hundreds of millions of dollars in university grants due to ideological disagreements, the defunding of public broadcasting over its reporting content, the installation of a “bias monitor” in the newly merged Skydance/Paramount company, the Trump-directed Federal Trade Commission disallowing the newly merged Interpublic and Omnicom from refusing to advertise on platforms based on political content –  and the list continues. As federal court Judge Sooknanan stated in her decision granting a preliminary injunction against the FTC’s investigation into liberal watchdog Media Matters for America: “It should alarm all Americans when the Government retaliates against individuals or organizations for engaging in constitutionally protected public debate. And that alarm should ring even louder when the Government retaliates against those engaged in newsgathering and reporting.”

Conclusion 

Instead of trying to influence laws across the Atlantic, Congress would serve American speech rights better by tackling the real censorship happening at home.

Experts in platform regulation contend, “nothing about the EU’s Digital Services Act (DSA) requires platforms to change the speech that American users can see and share online.” While it’s true some elements of the DSA – specifically in terms of what is considered illegal content – would be barred here in the U.S. by the First Amendment, the U.S. cannot override EU laws. The HJC report’s concerns stem from fundamental misunderstandings of the DSA’s constitutional constraints and territorial limitations. Instead, Congress might consider how the DSA’s transparency requirements, due process protections, and limits on targeted advertising to children reflect principles that already have bipartisan support here — from the Internet PACT Act to COPPA 2.0.

The post The Censorship Alarm Is Ringing in the Wrong Direction appeared first on Public Knowledge.

]]>
New Public Knowledge Paper Proposes Child Online Safety Framework Protecting Kids, Free Expression https://publicknowledge.org/new-public-knowledge-paper-proposes-child-online-safety-framework-protecting-kids-free-expression/ Thu, 21 Aug 2025 13:41:03 +0000 https://publicknowledge.org/?p=38238 New paper outlines how to prioritize making the internet safer and privacy-protective for everyone.

The post New Public Knowledge Paper Proposes Child Online Safety Framework Protecting Kids, Free Expression appeared first on Public Knowledge.

]]>
Today, we’re happy to announce our newest white paper, “The Kids Aren’t Alright Online: How To Build a Safer, Better Internet for Everyone,” by Public Knowledge Government Affairs Director Sara Collins and Policy Analyst Morgan Wilsmann. The paper discusses the need for policymaking that prioritizes making the internet safer and privacy-protective for everyone as well as children, then outlines how to get there.

The relationship between children and digital technology has become a contentious policy debate. As young people increasingly live their lives online, we see endless headlines chronicling alarming stories of cyberbullying, online predators, harmful content, and addictive design features. This tension has catalyzed a global wave of regulatory responses, from comprehensive privacy frameworks to age-appropriate design codes to outright bans on social media use by minors. Yet, many of these policy proposals struggle to sufficiently strike a balance between protecting children online and preserving free expression without cutting children off from the substantial benefits that technology platforms can offer.

This paper proposes a child online safety framework that forces technology companies to design platforms with children’s wellbeing as a primary consideration. This approach – often referred to as “safety by design” – shifts responsibility from individual users and their families to the corporations that profit from the use of their online platforms. Rather than asking children to navigate exploitative systems or parents to police every online interaction their kids have, we the people should demand that companies build platforms that are safe for everyone by default.

This paper is a product of two key feedback opportunities, first among them a January 2025 Public Knowledge Policy Konclave that brought together more than 20 subject matter experts from law, technology, policy, and academia to explore topics like age assurance, the current state of legislative activity, and how a risk-based approach to child safety could work. The second event influencing this research includes a June 2025 workshop for high school students in partnership with the Civic Innovation Academy members of Civics Unplugged. This workshop hinged on how digital technology is shaping the lives of students and how these students wish the internet could change to serve them better.

You may attend our in-person launch event, “The Kids Aren’t Alright Online: Building a Safer, Better Internet,” at the InterContinental Wharf hotel in Washington, D.C. on September 8 for a paper presentation and discussion featuring expert panelists.

The following can be attributed to Sara Collins, Government Affairs Director at Public Knowledge:

“The internet has the potential to enrich children’s lives, but only if we design it with both their safety and their rights in mind. We need laws that create the right incentives: rules that require protective settings to be on by default; age verification only for features that are likely to cause significant harm; and transparency into how platforms operate. 

“Importantly, lawmakers must resist simplistic fixes like blanket bans, content restrictions, or a reliance on parental consent, which do little to address systemic design flaws and risk undermining children’s and adults’ freedom of expression and privacy. By adopting a risk-based approach, fostering proactive protective behaviors, and ensuring researchers can reveal what works, we can strike the right balance – protecting children online while preserving the benefits of the internet for everyone.”

You may view the paper here. You may also register to attend the in-person launch event on September 8 at the InterContinental Wharf hotel in Washington, D.C. You may also view our latest blog post, “Weighing in on the Age Verification Debate: Risk-Based Approaches To Minimizing Harm for Child Users,” for more information on our proposal.

The post New Public Knowledge Paper Proposes Child Online Safety Framework Protecting Kids, Free Expression appeared first on Public Knowledge.

]]>
Weighing in on the Age Verification Debate: Risk-Based Approaches To Minimizing Harm for Child Users https://publicknowledge.org/weighing-in-on-the-age-verification-debate-risk-based-approaches-to-minimizing-harm-for-child-users/ Thu, 21 Aug 2025 13:28:39 +0000 https://publicknowledge.org/?p=38230 The worst online harms to children stem not simply from exposure to indecent content, but from design features that facilitate harmful interactions or compulsive use.

The post Weighing in on the Age Verification Debate: Risk-Based Approaches To Minimizing Harm for Child Users appeared first on Public Knowledge.

]]>
Today Public Knowledge launched a new white paper, “The Kids Aren’t Alright Online: How To Build a Safer, Better Internet for Everyone,” to weigh in on the heated debate on child safety online. While lawmakers across the country are eager to pass legislation aimed at protecting young users online, many of these well-intentioned efforts are, in our view, overrestrictive and fail to address underlying contributors to harm. 

Our paper argues that the responsibility should be placed on technology companies to design safer products by default, while preserving the substantial benefits that platforms can offer to the learning, creativity, and social development of kids and adolescents. Among our various policy recommendations in our child safety framework as outlined in this paper, one stands out as particularly relevant given recent legal and legislative momentum: our proposal for risk-based age verification.

Age Verification v. Age Assurance

Before diving into this approach, it’s important to distinguish between two related but distinct concepts that may get conflated in policy discussions. Age verification refers to definitive processes that confirm a person’s exact age through official documents like government IDs. Age assurance, on the other hand, encompasses a broader range of methods to establish reasonable confidence about someone’s age range, including age verification as one option alongside less invasive approaches like age estimation technology, online behavioral analysis, or account history review. Our risk-based framework considers this spectrum of age assurance tools, applying more rigorous verification only where the risks truly warrant it, rather than defaulting to the most invasive approach for every online interaction any user – regardless of age – could have.

The Supreme Court Green Lights Age Verification, But That May Not Be the Best Approach in All Circumstances.

This summer, in Free Speech Coalition v. Paxton, the U.S. Supreme Court overturned prior precedent on internet age-gating, upholding a Texas law requiring websites to verify users’ ages before allowing access to sexually explicit content. As Public Knowledge Legal Director John Bergmayer explained in his post, “Protecting Kids Shouldn’t Mean Weakening the First Amendment,” the legal reasoning in this decision lowers the level of constitutional scrutiny applied to content-based restrictions on adult speech. This decision may provide an opening for other vague, privacy-invasive platform policies under the pretense of “protecting the children.” In fact, a few weeks after the Free Speech Coalition decision, the Supreme Court declined to review a lower court ruling that upheld Mississippi’s law mandating social media platforms to verify users’ ages. While everyone can agree that access by minors to pornography has no social benefit and very real harms, the same cannot be said of all social media. Here, the balance of potential benefits and potential harms becomes much more complicated.  

While Public Knowledge would rather not see the government erect unnecessary barriers and friction within the Open Internet, there is clear momentum toward a federal law that mandates some sort of age assurance to access online platforms like social media. Rather than focusing on restricting access to lawful, perhaps even beneficial, content with “all or nothing” age verification mandates, age assurance mandates could be applied narrowly, targeting features and functionalities that pose a heightened risk to minors. If a federal age assurance mandate were to happen, policymakers should prioritize policy approaches that 1) are privacy protective and 2) minimize the burden on adults to access online content. 

The Supreme Court decision may have given the go-ahead to the 24 states that have passed laws requiring age verification to access obscene-for-minors content online. Add the dozen states that have age verification bills moving through their legislatures, and we have a majority of the country blocking access to platforms with adult content. While these laws are largley focused on restricting access to pornographic content, it is likely that the effects will be felt more widely. As other states follow the example of Mississippi and expand the range of platforms subject to age verification, adults across the United States will have to prove they are old enough to engage in online discourse, access speech, and express themselves online – in other words, those in America will have to demonstrate that they’re old enough to use the internet. Age-gating risks restricting free online expression for adults unless they share personally identifiable information, which can be easily hacked and leaked. We just saw this happen with the Tea Dating Advice app, a platform where women share details about their dates. The app required users to submit ID to verify they are women. 4Chan users, perhaps predictably, hacked Tea’s data, leaking users’ government IDs to trolls online, exposing these women to online harassment and potential fraud. 

Learning From Our Peers

There will probably be a lot to learn from our friends across the pond who are testing out a nationwide age verification requirement. The United Kingdom’s “Online Safety Act” went into effect in July of 2025; it mandates social media and search services to require age assurance to access content deemed harmful to children (not just pornography, but content that promotes self-harm, eating disorders, suicide, and other destructive behaviors). So far, reports suggest that implementing the age verification requirements has been less than perfect. Platforms are struggling to implement an effective age verification process that satisfies the law’s requirements, but without much success. To be fair, as “Love Island” U.K. contestants often say, “It’s still early days.” But as of the writing of this article, over half a million people have signed a petition for the U.K. parliament to repeal the “Online Safety Act.” 

The “Online Safety Act,” like most U.S. age verification bills, does not specify how online platforms must verify user ages. Many platforms are offering a variety of verification options, ranging from biometric face scanning to uploading an ID or entering a payment card. These different age verification approaches have mixed efficacy and provoke different tradeoffs. Biometric face scanning, for one, merely estimates age rather than verifies it, and has mistakenly classified adults as adolescents, restricting over-18s from adult-appropriate content (the opposite can happen, too, where adolescents are mistakenly deemed to look more mature than their actual age and are allowed access to adult content). 

Many users have apparently chosen not to bother with any of this madness by using virtual private networks (VPNs) to disguise their IP address and geographic location, enabling them to access websites as if they are in a different country. As the BBC reported, half of the top 10 free apps in the Apple App Store in the U.K. in July were for VPN services, with one VPN app maker claiming it saw a 1,800% increase in downloads since the “Online Safety Act” went into effect. This is not unlike the increase in search traffic for information about VPNs that occurred once Virginia’s age verification law went into effect in 2023. 

There’s a reason free expression civil society groups hammer home that age verification requirements will almost certainly impede the speech of adults. The rollout of the U.K. “Online Safety Act” provides a key example, where platforms feeling the pressure to comply will sweep up broad categories of content that could, if you squint, be indecent for children, all while cutting off adults from accessing speech. 

It is not simply a question of blocking adults from engaging in controversial speech. The all-or-nothing approach of age verification, combined with the fear of prosecution by platform operators, means that children will be blocked from non-harmful, even educational and useful, speech. Those going online in the U.K. are documenting the parts of the internet that are unexpectedly blocked. Reddit, for one, blocked access for unverified users to subreddits like r/stopsmoking or r/stopdrinking – communities which, ostensibly, provide resources and support for healthier behaviors. Clips of protests were reportedly blocked on X, formerly Twitter. Spotify users find themselves unable to watch certain music videos or stream songs labeled 18+. 

But is a kid perusing the subreddit r/stopsmoking meaningfully different from participating at an anti-smoking campaign at their school?  Are teenagers made better off by remaining ignorant of protest movements in their country or around the world? Exposure to controversial content is not inherently harmful, and locking down this kind of material can strip young people of both knowledge and agency. Content gating, like the “Online Safety Act,” assumes all children are passive subjects needing constant control by their parents or the state. Public Knowledge rejects this framing and proposes a different solution – one that recognizes that children and adults being able to freely access information without government intervention is a First Amendment right worth preserving.

A Better Approach – Focus on Harmful Platform Features Rather Than Content

Age verification requirements coming to the U.S. may seem inevitable, but there are still opportunities to influence the shape of these requirements. We have shown that when these requirements are applied too broadly, they cause significant friction for users, particularly adult users, which incentivizes them to seek workarounds – like VPNs. The challenge in enforcing an age verification mandate also means regulators will concentrate on ensuring the law is properly implemented, rather than pursuing actual harms to children.

In our view, rather than mandating blanket age gates, lawmakers should adopt a risk-based standard that scales requirements according to the potential harm of specific features. In our report, we focus intentionally on platform features – not content types – as defining what content is inappropriate for children is largely subjective. We believe content-based age gating would, as we’ve witnessed in the U.K.’s “Online Safety Act,” result in platforms imposing overly broad restrictions on content to avoid liability to the detriment of both adults and the children they are trying to protect. 

In our view, the worst online harms to children stem not simply from exposure to indecent content, but from design features that facilitate harmful interactions or compulsive use, like nudge notifications, infinite scroll, and gamification. It also includes features that connect users to unfamiliar contacts, like map features where you can see other users (SnapMap and the recently launched Instagram Map), or options to quickly add suggested connections – even strangers – or send direct messages to users you are not connected with. Looking to features as conduits of harm rather than content gets at more insidious dangers that a focus on content does not address. 

Rather than burden all users with verifying their ages to view content online, risky features should be age-gated, while low-risk activities – such as basic content browsing or accessing educational material – should remain unrestricted. Medium-risk features, like posting publicly on social networks, interacting with user-generated content (e.g., likes and comments), or receiving algorithmic recommendations, could rely on age estimation using existing data. Targeted advertising has long used demographic information to deliver campaigns to the intended audience, and large platforms are now implementing more formal, transparent age assurance tools.

For example, Google recently introduced a machine learning age estimation system that interprets account data to determine whether a user is likely over 18 – such as recognizing that a Gmail account created 15 years ago is unlikely to belong to a child – and allows users incorrectly labeled as under 18 to verify their age with a selfie or ID. Age estimation can serve as a practical compliance tool, but only if implemented with strong privacy safeguards that avoid expanded collection, retention, or sharing of personal data beyond what is strictly necessary. The appeals process must also be streamlined, transparent, and easily navigable for adults, ensuring lawful access to content and services is not unduly burdened. Poorly designed systems that impose excessive costs or technical requirements risk entrenching dominant platforms – which can more easily absorb these burdens – while disadvantaging smaller competitors and limiting consumer choice. Given that platforms benefiting financially from targeted advertising have used age estimation tools for years, requiring them to leverage these same tried-and-true methods to make their products safer for children is a reasonable regulatory burden.

High-risk features, like “going live” on a video stream or engaging in stranger-to-stranger messaging, could require stricter age assurance, like using biometric data or a government identification. Bluesky’s U.K. model, where unverified users can still browse feeds but must verify their age to access direct messaging and adult content, is one example of striking a better balance between free expression and child safety for high-risk features. 

Conclusion

The risk-based age verification framework we’ve outlined here and in our white paper, “The Kids Aren’t Alright Online: How To Build a Safer, Better Internet for Everyone,” offers a smarter alternative to blunt-force solutions that dominate policy debates on child safety online. By scaling verification requirements to be commensurate with actual risk levels, rather than treating all online experiences as equally dangerous, we can protect child users while preserving their rights to learn, explore, and connect online without punishing adults for accessing the internet.

But age verification is just one piece of creating a truly safer internet for children. Real progress requires addressing the underlying design choices and business practices that prioritize engagement over well-being; implementing privacy-protective defaults; and integrating safety into products from the ground up. Our white paper, “The Kids Aren’t Alright Online: How To Build a Safer, Better Internet for Everyone,” provides lawmakers with recommendations reflecting such principles. You can also join us in Washington, D.C. for a paper presentation and discussion at our September 8 event, “The Kids Aren’t Alright Online: Building a Safer, Better Internet.

The post Weighing in on the Age Verification Debate: Risk-Based Approaches To Minimizing Harm for Child Users appeared first on Public Knowledge.

]]>
Trustless, Not Truthless: Strengthening Media Literacy for the Web3 Era https://publicknowledge.org/trustless-not-truthless-strengthening-media-literacy-for-the-web3-era/ Thu, 07 Aug 2025 15:18:54 +0000 https://publicknowledge.org/?p=38198 When no one controls the message, everyone must learn to read it.

The post Trustless, Not Truthless: Strengthening Media Literacy for the Web3 Era appeared first on Public Knowledge.

]]>
Aristotle once said, “It is the mark of an educated mind to be able to entertain a thought without accepting it.” In today’s context, the mark of an educated mind is one able to entertain an information environment increasingly saturated with false, misleading, or even harmful content without accepting it. At a time of deepening ideological divides, media consolidation, and eroding trust in global institutions, this blog post describes how Web3 can come to epitomize an era of greater autonomy, transparency, and accountability—as long as users have the critical thinking skills necessary to navigate the new digital terrain. In fact, Web3’s structure might even be uniquely suited to facilitate the media literacy capabilities of its users.

“Media literacy” was first defined in 1992 as “having the ability to access, analyze, evaluate, create, and act using all forms of communication.” The definition for media literacy has since broadened to include understanding the systems in which media messages exist, their influence on our beliefs and behaviors, and the creation of responsible, thoughtful, and safe content. On the other hand, the federal government defines “digital literacy” as the ability to use digital technology to locate, evaluate, organize, create, and share information and encourages considerate and informed participation online. Given that these terms are often conflated, this blog post frames “digital literacy” as a natural extension of media literacy, serving as an umbrella term that encompasses new competencies required for navigating emerging technologies and overcoming challenges associated with digitization and decentralization.

What is Web3 and why is it uniquely suited for fostering media literacy skills? 

Web1 was the earliest version of the Internet (1990s to early 2000s), characterized by “read-only” static websites. Web2 refers to the second generation of the internet where we see user-generated content, social networking, and centralized control by a handful of powerful technology companies like Meta, YouTube, TikTok, X (formerly Twitter), and Google. Today’s Web2 landscape presents significant risks to media literacy—these companies offer free services in exchange for user data, but their ad-based business models prioritize engagement over accuracy, amplifying emotionally charged and often misleading content to maximize profit. Despite growing concerns over the immense power and influence among a handful of companies, antitrust enforcement has not kept pace with consolidation in digital markets. 

In Web2, centralized oversight and opaque algorithms allow toxic and misleading content to flourish, eroding trust and undermining informed engagement. Algorithms also enable communicators to test which headlines or messages perform best in real time, further incentivizing sensationalism over substance. Compounding these challenges, artificial intelligence is upending the information ecosystem and transforming how people access news. AI tools now generate article summaries of entire news stories, and often come first in search results—shaping public understanding without human interference. As a result, those with low levels of digital and media literacy may be ill-equipped to confirm accuracy or legitimacy of information served through AI summaries, especially when those summaries do not link to original reporting. This is especially true for adults with low literacy, who also tend to be frequent social media users, making them vulnerable to inaccuracies and making them conduits for the spread of misinformation. 

Web2 platforms like Meta, YouTube, and X (formerly Twitter) have begun experimenting with peer-driven fact-checking features like Community Notes, but these tools expose the growing pains of user-driven models. Although the introduction of peer-moderation tools like Community Notes is intended to signal a commitment to accuracy and transparency, their real priorities still center on keeping users engaged rather than informed. Emotionally charged content gets more clicks and shares, so it’s promoted more heavily. 

Web3 potentially shifts power over how content is organized from Big Tech companies to individuals. As users graduate from being subjects of moderation to co-overseers of the information ecosystem, critical thinking and discernment are critical. Web3’s decentralized architecture embeds media literacy into the infrastructure itself and makes the provenance of digital content—who created it, how or whether it has been modified, and when it was published—both visible and verifiable. 

Additional autonomy turns off potential Web3 users and exacerbates digital divides

In Web2, users are typically treated as passive consumers of algorithmically curated content, but Web3 presents a new participatory, transparent, and self-governed digital environment. The challenge we face isn’t just identifying incorrect health advice; rather, it’s determining whether a video of a politician is AI-generated or real, or whether a “breaking news” post is credible journalism or a coordinated disinformation campaign. In the Web3 context, those who are better equipped to identify manipulated media and ragebait can organize their feeds to prioritize quality information. At the same time, disparities in media literacy capabilities may exacerbate the digital divide, as users who are not equipped to navigate decentralized platforms and control their feeds remain more vulnerable to toxic information systems.  

New platforms like Mastodon, Bluesky, Gab and other blockchain-based sites are free from centralized control, so there’s no one person or group to blame for moderation. These forums allow individuals to decide for themselves what they’d like from their internet experience, and establish their own content and engagement policies in accordance to their values and preferences. The transparency and onus on users themselves increases legitimacy and lowers suspicion of hidden agendas or ideological bias by Big Tech gatekeepers. 

Despite all of its promise, Web3 isn’t necessarily user-friendly. Increased freedom comes with increased responsibility—users on these new sites are now tasked with curating their “ideal internet” and customizing their feeds and digital interactions. Many Web2 users are unfamiliar with the added responsibilities associated with decentralized platforms, and the lack of intuitive design or familiar features in Web3 often deters them from managing feeds, participating in collaborative moderation, or making an account altogether. The success of (decentralized) peer moderation depends on trust, visibility, and feedback loops. Without the internal fact-checking teams for major Web2 platforms, Web3 ecosystems risk being flooded with misinformation and bad actors—especially if peer-driven systems fall behind the speed of information sharing. They also risk losing relevance altogether if users feel their contributions have no impact on the democratic Internet they hoped to create.

Congress should focus on the literacy we need for the internet we want

Policymakers may feel pressured to regulate how platforms deal with information, whether through algorithm mandates or Section 230 sunsetting threats. However, instead of regulating how users consume information, they should focus on the promising applications of Web3 technologies in helping people navigate our toxic information systems, and prioritize how to support and uplift these new technologies rather than controlling how platforms handle their content. To rebuild public trust, support informed regulation, and enable researchers, civil society, and the public to better understand how information both flows and is controlled online, Congress should focus on media literacy-related legislation, such as: 

  • Passing the Investing in Digital Skills Act, Adult Education WORKS Act, and supporting media literacy education efforts tailored to distinct populations like youth, older adults, and communities with varying digital access or political exposure; 
  • Funding developing tools like the Misinformation Susceptibility Test (MIST) and other frameworks that measure misinformation exposure and resilience; 
  • Directing the National Institutes for Health, National Science Foundation, U.S. Department of Education, Federal Communications Commission, and others to conduct longitudinal research on how trust, digital habits, and demographic factors shape misinformation vulnerability in decentralized spaces; and,
  • Requiring digital platforms to periodically disclose how they moderate, label, and prioritize content. 

Web3 is not a panacea, but a rapidly evolving frontier that brings both new opportunities and complex challenges. For Big Tech’s business model and global success, virality trumps accuracy. Unlike Web2, where platforms filter and prioritize content, Web3 places that responsibility directly on users. In decentralized environments, where centralized enforcement is absent or limited, media literacy becomes the critical safeguard. Greater freedom means greater vulnerability to misinformation, manipulation, or exclusion. Without engaged communities or platform support, peer moderation can fail. Strong media literacy skills are crucial to help people navigate this era of autonomy and participate in democratized digital spaces, where it falls on them to navigate, verify, and contribute to trustworthy digital ecosystems. A decentralized internet is only truly empowering if all users are equipped to make sense of it.

The post Trustless, Not Truthless: Strengthening Media Literacy for the Web3 Era appeared first on Public Knowledge.

]]>
2025 Emerging Tech https://publicknowledge.org/2025-emerging-tech/ Mon, 02 Jun 2025 17:25:05 +0000 https://publicknowledge.org/?p=37991 Emerging Tech brings together public interest advocates, policymakers, and companies to discuss the promise, pitfalls, and policy implications of cutting-edge tech.

The post 2025 Emerging Tech appeared first on Public Knowledge.

]]>
Public Knowledge hosts Emerging Tech to bring together public interest advocates, policymakers, and companies on the cutting edge of technology. The event helps inform policymakers about the promise, potential pitfalls, and policy implications of fast-moving tech dominating headlines and impacting our society in new ways.

We invite industry experts from across the country to join us for an all-day discussion as well as a reception and tech showcase, where guests can experience the latest technologies first-hand and get to know the people driving the next iteration of the internet.

In 2025, we convened a targeted, in-depth gathering aimed at fostering deeper engagement, collaboration, and policy conversations between policy advocates and emerging technology audiences. This year, we explored the policy implications and opportunities within frontier technologies, emphasizing the decentralized web, generative AI, and XR innovations.

Public Knowledge Interim CEO Meredith Whipple offered opening remarks to highlight the impact emerging technologies can have on society. She then introduced Public Knowledge Senior Policy Counsel Nicholas Garcia, who began the conversation by providing an overview of themes for the day, how the activities would be structured, and what guests could hope to achieve by participating in this unique, shared experience. After leading table introductions, the event then turned to the first discussions of the day.

Ross Schulman, Software Engineer at SpruceID; Jay Stanley, Senior Policy Analyst at the American Civil Liberties Union; and Nick Pickles, Chief Policy Officer at Tools for Humanity give lightning talks on Decentralized Digital Identity at Emerging Tech.

Speakers from SpruceID, American Civil Liberties Union, and Tools for Humanity joined Public Knowledge moderator and Senior Policy Counsel Nicholas Garcia in a series of lightning talks on Decentralized Digital Identity. A Q&A followed the lightning talk to give participants a chance to engage the speakers directly.

Ivan Sigal, Interim Director at Free Our Feeds and Renée DiResta, Associate Research Professor at the McCourt School of Public Policy at Georgetown, participate in a fireside chat on Decentralized Social Media at Emerging Tech.

Speakers from Free Our Feeds and Georgetown University then led the workshop in a fireside chat on Decentralized Social Media: Success Stories and Future Potential. The fireside chat kicked off a networking lunch with activities designed to help guests better connect with each other and the themes of the discussion.

Lacey Strahm, Head of Policy at OpenMined; Anna Lenhart, Policy Fellow at Institute for Data, Democracy, and Politics (GWU); Josh Levine, Research Fellow at Foundation for American Innovation; and Lorelei Kelly, Founder of Georgetown Democracy, Education + Service give lightning talks on Artificial Intelligence.

Speakers from OpenMined, Georgetown’s Institute for Data, Democracy, and Politics, the Foundation for American Innovation, and Georgetown University joined Public Knowledge moderator and Senior Policy Counsel Nicholas Garcia in a series of lightning talks on Artificial Intelligence. A Q&A followed the lightning talk to give participants a chance to engage the speakers directly. Public Knowledge then led a table discussion on the themes and questions raised during the lighting talks.

Following the lightning talks, Helen Toner, Director of Strategy and Foundational Research Grants at Georgetown’s Center for Security and Emerging Technology, keynoted the event by addressing the Unresolved Debates About the Future of AI. A Q&A and table discussion followed her commentary.

Jack Henderson, Chief Operating Officer at RadicalxChange Foundation; Sam Hammond, Chief Economist at the Foundation for American Innovation; Judith Donath, Faculty Associate at Berkman-Klein Center (Harvard University); and Larry Williams, Jr., the President of UnionBase give lightning talks on AI.

Speakers from RadicalxChange Foundation, the Foundation for American Innovation, the Foundation for American Innovation, the Berkman-Klein Center at Harvard University, and UnionBase joined Public Knowledge moderator and Senior Policy Counsel Nicholas Garcia in a second series of lightning talks on Artificial Intelligence. A Q&A followed the lightning talk to give participants a chance to engage the speakers directly. Public Knowledge then led a table discussion on the themes and questions raised during the lighting talks. Finally, the event culminated in an activity where guests drafted headlines from the future based on what they had discovered or found the most captivating from the day’s conversations.

Multiple companies and organizations joined the event to showcase some of the latest technology discussed in the panels. This showcase gives guests the opportunity to experience these emerging technologies directly so they can begin grasping both the practical and policy potential of these devices and how they might impact our society. You can glimpse the tech showcase for 2024 and even 2023 to learn more about this unique display.

In 2025, we were joined by:

Filecoin Foundation for the Decentralized Web, a nonprofit stewarding the development of open-source software and open protocols for decentralized data storage and retrieval networks, featured Starling Lab’s framework for data integrity.

XR Association (XRA), which exhibited virtual reality (VR) hardware that enabled attendees to jump into immersive experiences, including the RayBan Meta Smart Glasses and Meta Quest VR headsets.

Reach out to Michele Ambadiang, our Events Manager, to sponsor this event, join the panel discussions, or participate in the tech showcase in 2026 to share your own world-changing technology. View our YouTube channel to experience more of this momentous event.

The post 2025 Emerging Tech appeared first on Public Knowledge.

]]>
Why Decentralized Social Media Matters https://publicknowledge.org/why-decentralized-social-media-matters/ Thu, 29 May 2025 17:23:41 +0000 https://publicknowledge.org/?p=37979 Leaving a social media platform often forces content creators to start over from scratch on a new site. But what if that didn't have to be?

The post Why Decentralized Social Media Matters appeared first on Public Knowledge.

]]>
Ever since Elon Musk acquired X, formerly known as Twitter, the social networking site has been losing users. Waves of articles and tweets have emerged outlining the various reasons why users are fleeing the platform: dissatisfaction with the new owner, the devolving content moderation system, the throttling of news content, the reinstatement of previously banned users, the right-wing bias of the algorithm, the monetization of racist, antisemitic content, and many more.

While there has been extensive literature over the years documenting users’ negative feelings over their social media usage, it is notable that it took such a seismic shift in leadership to actually make users quit one of the most prominent media platforms. Surveys of people trying to quit social media point to the many benefits that it provides them – social connection, finding new job opportunities, and a general sense of keeping up with the news – all of which disappear when they leave. And for a platform that has come to represent culture online, the loss of over a decade’s worth of information is no less felt. The preservation of the billions of 140 and 280-character tweets, gifs, memes, informational threads, useful hacks, and sassy clapbacks, all of which came to represent culture online, has proven challenging especially after the Library of Congress stopped preserving every tweet in 2017.

This is not the first time that a platform has lost a lot of information capital. In 2009, Yahoo! shut down GeoCities in the US, one of the earliest platforms that allowed for easy creation, hosting, and discovery of user-created websites. At the time of its closing, there were about 38 million pages on the service, representing millions of hours of work by users engaging in free expression online through the creation and customization of personal webpages. Though the Internet Archive and the Archive Project launched a movement to preserve as much of the content as possible, eventually releasing about 1TB of data representative of the early internet’s digital culture, much of the data was still lost.

The history of social networking is filled with people populating vibrant spaces, creating communities, finding innovative ways of self-expression, and then being forced to abandon these painstakingly created digital havens due to platforms failing from economic pressures or just plain ol’ bad management.

What if, then, the next time, people could take their stuff to the next place they inevitably have to move?  

Decentralization could theoretically be a way to make switching platforms easier without having to wait for user experiences to get unbearably bad. Instead of starting from scratch at the next governance disagreement, users could take their network and their content with them when they want to leave. 

Challenging Platform Power Through Decentralization 

Decentralization means that there isn’t one single entity responsible for making rules for the entire ecosystem – each community can decide to create and follow its own set of practices.

Right now, who you talk to on the social internet depends a lot on the platform you use. Your Facebook friends stay on Facebook, your X followers are present on X. There is no way to broadcast to someone using X from your Facebook account. You have to log into the app to communicate with them. This is called platform lock-in, and this is what gives these tech platforms part of their outsized power.

Essentially, the platform has held your community captive, and they get to set the rules for how you interact with them.

The Fediverse is one solution for facilitating decentralization using technical means. All apps in the Fediverse are built upon a technical protocol (called ActivityPub) that allows separating your contact list from the interface client you use to interact with it. Instead of being owned and operated by a single entity or corporation responsible for governance of an entire platform, there are multiple servers run by a variety of actors that set their own rules for how they will be run. Consider email, for example – it doesn’t matter whether you use Outlook, you are still able to send emails to people using a Gmail address or a Yahoo address. What if you could actually talk to your X followers from Threads in this same way?

The difference between the user experiences, then, would not be the content you see or the people you follow, but how the information is presented to you. Platforms would have to compete on all the things that make up their user experience – their content moderation policies that make them comfortable to be on, the ease of use of their user interface, etc. Have an issue, then, with how X is run? Using the Fediverse, you can leave the platform, but be able to take your audience and years of shitposting with you to the next microblogging environment that feels better to use.

Decentralization presents significant challenges. 

The biggest challenge for the Fediverse – and decentralization at large – is user adoption. Even with all the benefits that it can provide, getting started with Mastodon is actually really hard.

If you’re a new user with no technical background, it may take some time and effort to familiarize yourself with the vocabulary and the mental models associated with Federation – during the start-up process, you have to decide which servers to be a part of, which communities to engage with, and more. It is a lot easier to outsource all of that thinking and just sign up on a platform when you have 3 minutes between whatever tasks you’re performing at work.

Further, the Fediverse may require much more active participation than traditional social media. An online community requires people who are responsible for customized moderation, for keeping the servers up and running, and for creating the content that makes all the surrounding work worth the effort. Currently, most servers are expensive, volunteer-run and maintained, and community guidelines often depend on the effort that the moderators have decided to put in.

This need for active participation is not necessarily a bad thing – after all, an engaged society is a democratic society. And many proponents of the Fediverse (and current users), are active participants due to their belief in the principles of decentralization.

However, the real world presents a different story. Literature from online communities points to the 90-9-1 rule: 1% of users create most of the content, and 90% of users are lurkers who do not participate, with the remaining 9% simply engaging with content with a like or a share. There are similar stories from the open source world – the vast majority of open source code is maintained by volunteers and used without contribution to either the developers or back into the community. This makes developer compensation extremely difficult.

If we want these projects to be sustainable – which we should because even volunteers need ways of buying groceries – we need new business models for decentralized social media. They cannot all be passion projects that people devote time to outside their day jobs. The genius (and then peril) of digital advertising – which dominant platforms rely on – is that it is a way for platforms to monetize the vast majority of lurkers by dividing them into ever more targeted buckets and selling them products, while still maintaining free access to the platform itself.

The business models for the Fediverse are trickier. One, because the Fediverse is not one thing – it is a collection of servers organized around interests – and the popularity of one doesn’t necessarily translate to the upkeep of another. In traditional media models, like book publishing, the publishing house stays afloat through selling a mix of extremely popular bestsellers that then subsidize less popular books on the publishing schedule. When communities aren’t connected to each other, the success of one cannot directly benefit another.

Could these be the problems of nascency? Potentially! It seems foolhardy to discount that the Fediverse is still too new of a concept to be monetized. With enough adoption, servers with similar missions and moderation policies could create cooperatives to negotiate with advertisers. Data cooperatives, like Resonate, a music streaming cooperative, are an example of people wrestling back control of their data and putting it to community approved use, so it’s hard to see why that couldn’t happen for the Fediverse.

Currently, the crowdfunding model seems to be the primary way in which instances support themselves, and we’re also seeing attempts at monetization through subscriptions. Existing independent companies like Ghost, WordPress, and Tumblr have all pointed to their Fediverse ambitions and are at different stages in their decentralization journeys. The question of business models is far from solved, but there does seem to be hope on the horizon.

It is also yet to be seen how much the mainstream audience cares about the philosophy of decentralization. Network effects show that people go where people are. The ability to leave with your network may not be the deciding factor for people choosing their social media platforms. The rise of TikTok, for example, is in direct contrast with the principles of decentralization, and yet growth in its user base shows no sign of slowing down. TikTok has platform lock-in, a very strong algorithmic recommendation feed, a very passive discovery and consumption experience, and yet people flock to it. It is a lurker’s paradise.

However, the ability to port over an audience to friendlier servers is the very thing that makes decentralization attractive to the people who create content on the internet. The very people who help make these platforms successful are the ones who have the most incentive to maintain direct control over the channels through which their communication occurs.The Fediverse is not a panacea for user empowerment – far from it – but it could theoretically provide a worthwhile alternative to the problem of platform control. New surveys show that people are looking for smaller, more intimate communities to be a part of online. Then, letting users and creators collectively decide what content they want to interact with might be the way to make the social web more intentional – and people more optimistic – about the time they spend online.

The post Why Decentralized Social Media Matters appeared first on Public Knowledge.

]]>
What Does Research Tell Us About Technology Platform “Censorship”? https://publicknowledge.org/what-does-research-tell-us-about-technology-platform-censorship/ Wed, 21 May 2025 15:46:23 +0000 https://publicknowledge.org/?p=37967 While Trump administration officials claim a "censorship cartel" is targeting conservatives online, the available data tells a different story.

The post What Does Research Tell Us About Technology Platform “Censorship”? appeared first on Public Knowledge.

]]>
Like many other stakeholders, Public Knowledge is preparing a response to a request for public comment from the Federal Trade Commission on the topic of “technology platform censorship.” The FTC’s request encourages respondents to reply to a series of questions by recounting ways platforms like Facebook, YouTube, and X (previously Twitter) may have disproportionately “denied or degraded” users’ access to services based on the content of the users’ speech or affiliations. The request appears to be part of an effort by the FTC, Federal Communications Commission, and Department of Justice to break up a “censorship cartel” that Trump administration officials claim systematically censors Americans’ political speech. Based on the submissions so far, the FTC can expect to receive hundreds, if not thousands, of anecdotal – and many anonymous – comments that staff will probably not be able to verify actually occurred.

To ensure our own comments to the FTC are rooted in evidence, we reviewed eight years of research on political content moderation. Our literature review included research studies and white papers from academics, journalists, whistleblowers, social scientists, and platforms going back to 2018. Our goal for this post is to provide a summary of this research and the conclusions we draw from it. 

Challenges of Researching Platform Content Moderation

Unfortunately, research investigating questions about algorithmic curation and bias – and content moderation in general – has been constrained by these challenges:

  • Limited collaboration between platforms and researchers; 
  • The difficulty of defining and quantifying bias in research design;
  • Frequent changes to the platforms’ feed-ranking algorithms; 
  • Controlling for platform features such as content personalization; and, 
  • In the absence of platform data, the need to work with user histories or web-scraped data that may reflect the user’s own preferences (such as channels or subscriptions).

If anything, the technology platforms have compounded these challenges over time by restricting access to their data: Meta unwound its CrowdTangle tool (researchers consistently say the company’s new “content library” does not provide the same insight) and X has restricted access and increased application programming interface, or API, fees for researchers. These barriers make it easier for conspiracy theories about content moderation to emerge and spread. Despite these challenges, clear themes emerged from the body of research. 

Themes from Research Regarding Political Content Moderation

Our secondary research review showed these dominant themes (see the subsequent sections of this post for links to the relevant studies):

  • There is little empirical evidence that platforms disproportionately deny or degrade conservative users’ access to services or that conservative voices or posts are disproportionately moderated due to their speech or affiliations. 
  • If anything, platform algorithms advantage conservative, right-wing, or populist content because such content tends to be highly engaging, and because there are structural advantages for right-wing or populist political influencers on technology platforms. 
  • Some of the characteristics that make this content more engaging also make it more likely to violate platform content moderation policies. So when conservative or populist content is disproportionately moderated, it is because it is more likely to violate the platforms’ community standards and terms of service. That is, asymmetric moderation results from asymmetric user behavior. This dynamic crosses international borders. 
  • To the extent that platforms do disproportionately deny or degrade service based on the content of speech (even if it does not violate platform policies), it overwhelmingly impacts marginalized communities, including people of color, LGBTQ+ people, religious minorities, and women. This may be due to how content policies are crafted, bias in moderation algorithms and training sets, and/or automated content moderation systems that do not understand cultural context or language cues. For technology platform users in general, these automated systems are incapable of understanding political motivation or affiliation. 

Note: In order to focus on dominant themes, we didn’t include every study we reviewed in this post. We encourage readers to use the links provided to understand the methodology in each study, and the citations within each study to access more information and resources. 

There Is Little Empirical Evidence That Conservative Voices Are Over-Moderated 

Researchers at New York University Stern School of Business’s Center for Business and Human Rights produced what may be the most comprehensive review of available research (as of February 2021) addressing the claim that platforms are biased in their moderation of conservatives. These researchers also conducted various analyses and rankings using Facebook’s CrowdTangle tool in the 11-month run-up to the 2020 US election. They found that right-leaning Facebook pages contained the most-engaged-with posts; right-wing media pages trounced mainstream media pages in engagement; and Donald Trump beat all other US political figures on the same measure. Independent studies by NewsWhip and Media Matters for America, cited in the same review, also showed that right-leaning Facebook pages and media publications outperformed left-leaning pages or performed similarly. The researchers also recounted a study showing that on YouTube, “partisan right” channels like Fox News and The Daily Wire performed similarly or better than “partisan left” channels, such as MSNBC and Vox, on key measures. 

Research also shows that outcomes users attribute to “bias” may actually be the result of a neutral product design. One research study about Google Search in 2018 noted, it’s “difficult to tease apart confounds inherent to the scale and complexity of the web, the constantly evolving metrics that search engines optimize for, and the efforts of search engines to prevent gaming by third parties.” This study found that the direction and magnitude of political “lean” in test subjects’ search engine results pages (SERPs) depended largely on the input query, not the self-reported ideology of the user. It also varied by component type on the SERP (e.g. “answer boxes”), and variable ranking decisions by the platform. If anything, “Google’s rankings shifted the average lean of SERPs to the right.” Another study of Google Search from 2018 showed that conservative users of the platform did not fully realize how dependent their results were on the phrases they used in their search queries. Nor did they have a consistent or accurate understanding of the mechanisms by which the company returns search results. (In the authors’ view, there’s no reason to believe this would differ for liberal users.) A study published in The Economist in 2019 showed that Google’s search algorithm mostly rewarded reputable reporting. That is, the most represented sources were center-left and center-right, and results indicating “bias” were actually the result of the user’s search term. 

If Anything, Platforms’ Engagement-Based Design Advantages Right-Wing Content

The single biggest driver of the societal impact of platforms’ content moderation is rooted in human nature: People are wired to pay more attention to information that generates a strong reaction. Research studies have shown that engagement on social media is associated with, for example, increased negativity and anger; outrage and confrontation; or incivility and conflict. Platforms must maximize engagement (e.g., posting, dwelling, liking, commenting, sharing) to optimize profit because of their advertising-based business model. As a result, even modest tweaks to algorithms to increase engagement (such as one Facebook made in 2018 to emphasize “meaningful social interactions”) can end up amplifying provocative and negative content. And as we describe in the next section, research consistently shows that right-wing sources use this type of content more often, and more effectively, on digital platforms.

A study published in The Economist in 2020 aimed to determine what content then-Twitter’s algorithm promoted. The researchers found that compared to its previous chronological feed, Twitter’s new “relevant” recommendation engine favored inflammatory tweets that are more emotive and more likely to have come from untrustworthy or hyper-partisan websites. Another study The Economist published later that year focused on Facebook. It showed that the most prominent news sources on Facebook are significantly more slanted to the right than those found elsewhere on the web, and that right-wing content from Fox News and Breitbart has more Facebook interactions than left-leaning news sites. 

The aforementioned report from NYU’s Stern Center for Business and Human Rights also concluded that social platforms’ algorithms often amplify right-wing voices, granting them greater reach than left-leaning or nonpartisan content creators. The authors analyzed engagement data and case studies of content related to high-profile incidents, finding no sign of anti-conservative bias in enforcement, even around contentious events like the January 6 riot at the US Capitol. They noted that right-leaning content frequently dominates user engagement metrics, largely due to Facebook’s algorithmic promotion systems, which reward content that provokes strong reactions. In other words, because Facebook’s feed algorithm optimizes for engagement, and outrage-driven or partisan posts often generate more clicks and shares, conservative pages that specialize in such content tend to benefit disproportionately.

Media Matters has tracked engagement on social media through several studies dating back to 2018. These studies undermine the idea that Facebook, in particular, is biased in its content moderation and reinforce the idea that platform algorithms favor engagement above all. One nine-month study completed in 2020 found that partisan content (both left and right) did better than nonpartisan content on Facebook, but “right-leaning pages consistently earned more average weekly interactions than either left-leaning or ideologically nonaligned pages.” The findings were similar to those in studies Media Matters conducted in 2018 and 2019. Their research in 2021 showed these effects were actually compounded after Facebook tweaked its algorithm to reduce the prominence of news, civic, and health information and video became more popular on the platform.

A study from Politico and the Institute for Strategic Dialogue in 2020 showed that “right-wing social media influencers, conservative media outlets, and other GOP supporters dominate online discussions” around the Black Lives Matter movement and voter fraud, including in Facebook posts, Instagram feeds, Twitter messages, and conversations on two popular message boards.

A study from the Brookings Institution in 2022 focused on YouTube, one of the first platforms to offer “recommendations” to users, also found that regardless of the ideology of the study participant, the algorithm pushes all users in a moderately conservative direction. 

Most publicly available data for Facebook shows that conservative news regularly ranks among the most popular content on the site, and Facebook has acknowledged that right-wing content excels at the engagement measures that drive algorithmic amplification. In the election year of 2020, study after study found that the Facebook posts with the most engagement in the United States – measured by likes, comments, shares, and reactions – were organic posts from conservative influencers outside the mainstream media. When asked about this dynamic, a Facebook executive noted, “Right-wing populism is always more engaging” and said that the content speaks to “an incredibly strong, primitive emotion” by touching on such topics as “nation, protection, the other, anger, fear.”

Twitter has also acknowledged that its algorithms favored right-wing content. In 2021, Twitter published its own study that “reveal[ed] a remarkably consistent trend: In six out of seven countries studied, the mainstream political right enjoys higher algorithmic amplification than the mainstream political left.” (This was before Elon Musk purchased the platform and rebranded it to X.) Germany was a notable exception. Twitter, at the time, acknowledged the results were problematic but could not determine whether certain tweets received preferential treatment because of how the Twitter algorithm was constructed or because of how users interacted with it.

Another cross-national comparative study based on Twitter in 26 countries, published in 2025, also found that this pattern extends internationally and that certain political ideologies are linked to a higher likelihood of spreading misinformation. Specifically, politicians associated with radical right-wing populist parties – characterized by exclusionary ideologies and hostile relations to democratic institutions – spread more online misinformation than their mainstream counterparts. The authors concluded that misinformation should be “examined as an aspect of party politics, serving as a strategy designed to mobilize voters against mainstream parties and democratic institutions.”

More recently, a study focused on the role of social media in the February 2025 election in Germany showed that X, TikTok, and Instagram (a Meta platform) were all most likely to show right-wing content to nonpartisan users. Content shown across every platform tested displayed a right-leaning bias. This included both content from the accounts the researchers set out to follow, and content that was selected “For You” by the platforms’ recommender systems. 

Besides the lift from the algorithms, conservative elites may also gain greater engagement on technology platforms due to structural advantages in how they use these platforms. One sociologist and professor noted in her 2019 book, “The Revolution That Wasn’t,” that “there is a lopsided digital activism gap that favors conservatives.” For example, online participation is greater with middle- and upper-class movements than their working-class counterparts, and conservative activists tend to come from higher income levels than progressives. Conservative groups, therefore, have more time and resources to invest in content and engagement, and their simple, powerful messaging focused on “freedom” and threats to America fits best with social media’s short attention span and character limits. 

A nationally representative survey of Americans conducted by Pew Research in 2024 shows another advantage that now accrues to conservative users: the growing popularity, distribution, and political orientation of news influencers. About one in five Americans now say they regularly get news from news influencers on social media. News influencers are defined as individuals who regularly post about current events and civic issues on social media and have at least 100,000 followers on any of the major social media platforms (Facebook, Instagram, TikTok, YouTube, and particularly X, which is the most common site for influencers to share content). According to Pew’s research, more news influencers explicitly present a politically right-leaning orientation than a left-leaning one in their account bios, posts, websites, or media coverage. Influencers on Facebook are particularly likely to prominently express right-leaning views.

Right-Wing Content is More Likely To Violate Platforms’ Community Standards 

One of the most consistent themes in the research on content moderation of political content is that what users may perceive as “biased” asymmetric moderation is actually the result of users’ own asymmetric behavior. Specifically, the research shows that conservative, right-wing, and populist platform users (the term varies by research project) are more likely to violate the platforms’ terms of service and/or community standards. Many of the examples date from 2020 and 2021, when platforms evolved their content moderation policies in response to the COVID-19 pandemic and the 2020 US election, both of which became highly politicized. In the interest of public health, safety, and democratic participation, most platforms selected authoritative sources of information such as the World Health Organization, Centers for Disease Control, and local election offices to calibrate their content moderation, up- and down-rank user content, fact-check and label content, and direct people to the latest available information. (For more details by platform in regard to COVID-19, see our blog post.) Users sharing information inconsistent with that of the authoritative sources selected by the platforms found themselves in violation of platform policies. Conspiracy theories, content that calls for violence against particular groups, and other forms of violative content incompatible with platform standards also resulted in disproportionate moderation.

For example, a study of 6,500 state legislators on Facebook and Twitter during the tumultuous time in 2020 and early 2021 (e.g., the pandemic, the 2020 election, and the January 6 riot at the US Capitol) showed that state legislators could gain increased attention on both platforms by sharing unverified claims or using uncivil language such as insults or extreme statements. The results affirm that platform algorithms generally favor content likely to get a strong reaction. However, Republican legislators were significantly more likely to post “low-credibility content” on Facebook and Twitter than Democrats, and Republican legislators who posted low-credibility information were more likely to receive greater online attention than Democrats. 

A new research report focused on X’s Community Notes program, now in preprint, examines whether there are partisan differences in the sharing of misleading information. The study is particularly relevant now that both Meta and TikTok have moved to community notes (user-sourced assessments of content) to add context to posts instead of third-party fact-checking partnerships. The researchers’ abstract highlights that posts by Republicans are far more likely to be flagged as misleading compared to posts by Democrats, and not because Republicans are over-represented among X users. Their findings “provide strong evidence of a partisan asymmetry in misinformation sharing which cannot be attributed to political bias on the part of raters, and indicate that Republicans will be sanctioned more than Democrats even if platforms transition from professional fact-checking to Community Notes.”

One 2020 study used YouTube as a lens to investigate whether the political leaning of a video plays a role in the moderation decisions for its associated comments. The researchers found that user comments were more likely to be moderated under right-leaning videos, but this difference is “well-justified” because the videos and comments are also more likely to have characteristics that violate the platform’s rules. These include extreme content that calls for violence or spreads conspiracy theories, or misinformation based on fact-checks. Or, the videos and comments have poor social engagement (such as a high “dislike” rate). Once these behavioral variables were balanced, there was no significant difference in moderation likelihood across the political spectrum.

A prominent study published in Nature showed that users estimated to be pro-Trump/conservative were, in fact, more likely to be suspended from Facebook than those estimated to be pro-Biden/liberal. However, this was because conservative users shared far more links to low-quality news sites – even when “news quality” was determined by groups of only Republicans – and they had higher estimated likelihoods of being bots. As noted above, Facebook’s recommendation algorithm maximizes for user engagement, and this study was one of several that found that misinformation content was more engaging to right-wing audiences. Facebook’s algorithm also appeared to rank misinformation more highly for right-wing users. (In other words, Facebook’s algorithm is doing what it is optimized to do: serve up more content that proves engaging to a particular audience.) The authors concluded that political asymmetry in moderation resulted from asymmetries in violative behavior, not politically biased content policies or political bias on the part of social media companies. This study was one of four that studied, with Facebook’s cooperation, the impact of Facebook’s recommendation algorithm during the 2020 US presidential election. 

Another group of data scientists and academic researchers who were given access to Facebook data regarding the impact of social media on elections and democracy in 2019 noted the same thing. They found that most of the high-profile examples of moderation of conservative content resulted from “more false and misleading content on the right” at a time when platforms were more aggressively moderating content related to elections. This researcher noted that, if anything, “Facebook’s algorithms could also be helping more people see right-wing content that’s meant to evoke passionate reactions.”

Researchers from the Observatory on Social Media at Indiana University, in their own comments to the FTC, described two studies they conducted that explored this question, one from 2017 and one from 2019. The studies “did not support claims of platform censorship.” The researchers noted, “The much simpler interpretation of the data is that the online behavior of partisans is not symmetric across the political spectrum.”

Platforms Are Most Likely To Degrade Access for Marginalized Communities

There is also a substantial body of research about the discriminatory impact of content moderation on marginalized communities, specifically people of color, LGBTQ+ people, religious minorities, and women. It was informed by a history of research designed to understand the impact of automated decision-making (in real estate, employment, financial services, and the like) on individuals that share characteristics protected by anti-discrimination legislation, including race, gender, and religion. Those systems, designed to profile individuals and make decisions about the allocation of economic opportunities, consistently showed the strong potential for bias in computational systems. In particular, they were shown to reproduce the historical, inequitable outcomes embedded in their data training sets and project them into the future as predictions of future outcomes. (Public Knowledge wrote about how harms from the algorithmic distribution of content are too often concentrated on historically marginalized communities in this blog post. We have also researched and written extensively about moderating race on platforms and Section 230 and civil rights.)

In her 2018 book, Algorithms of Oppression, UCLA professor Safiya Umoja Noble used textual and media searches to show “how negative biases against women of color are embedded in search engine results and algorithms.” She shared the premise that the profit motives of platforms combined with their monopoly status lead to a biased set of search algorithms. In regard to content moderation, research has focused on how the various elements of content moderation – the drafting of policies, the methods of enforcement, and the vehicles for redress such as user appeals – often mean that the voices of marginalized communities are subject to disproportionate moderation while harms targeting them remain unaddressed and the perpetrators protected.

For example, a field study of actual posts from a popular neighborhood-based social media platform found that when users talk about their experiences as targets of racism, their posts are disproportionately flagged for removal as “toxic” by five widely used moderation algorithms from major online platforms, including the most recent large language models. In the same study, human users also disproportionately flagged these disclosures for removal. The researchers further demonstrated a chilling effect: simply witnessing these valid posts discussing experiences with racism getting removed made Black Americans feel less welcome online and diminished their sense of community belonging. 

Another study was specifically designed to understand which types of social media users have content and accounts removed more frequently than others, what types of content and accounts are removed, and how content removed may differ between groups. The researchers found that three groups of social media users experienced content and account removals more often than others: political conservatives, transgender people, and Black people. However, the types of content removed from each group varied substantially. Consistent with the studies cited above, conservative participants’ removals often involved harmful content removed according to site guidelines (e.g., posts deemed offensive, COVID-19 claims inconsistent with those of authoritative sources, or hate speech), while transgender and Black participants’ removals often involved content related to expressing their marginalized identities. Despite following site policies or falling into content moderation gray areas, this content was removed. 

There are multiple contributors to this double standard. A report from the Brennan Center for Justice in 2021 highlights 1) how content policies are crafted, 2) bias in automated moderation algorithms, 3) content filters that lack cultural context, and 4) an inability to detect language nuance as the key drivers. Algorithmic systems may attribute the use of words or phrases describing authentic experiences related to gender identity, racism, domestic violence, or mental health to violative behavior on the part of users. Human moderators may also manifest their own bias: whether they lack training, time, or cultural understanding, they make false positive calls on content related to racism and equity more often for some groups. In the study –  based on Facebook, Instagram, YouTube, and Twitter – the researchers found that “content moderation at times results in mass takedowns of speech from marginalized groups [communities of color, women, LGBTQ+ communities, and religious minorities], while more dominant individuals and groups benefit from more nuanced approaches like warning labels or temporary demonetization.” The implication is that marginalized voices face extra hurdles to free expression online.

More recently, a group of over 200 researchers signed on to a letter that “affirm[ed] the scientific consensus that artificial intelligence can exacerbate bias and discrimination in society,” noting that “thousands of scientific studies” have shown that AI systems may violate civil and human rights even if their users and creators are well-intentioned. 

Summary of Research Conclusions

In conclusion, empirical research over the past decade reveals that social media content moderation has not always been neutral in its social or political impact. But it is marginalized voices that often bear a disproportionate burden – whether through higher rates of wrongful content removal, diminished reach in algorithmic feeds, demonetization, or threats to free expression that come from harassing, hateful, and false information posted by others online. Conversely, disproportionate moderation of conservative, right-wing, or populist content generally results from asymmetric compliance with platform community standards and terms of service.

The post What Does Research Tell Us About Technology Platform “Censorship”? appeared first on Public Knowledge.

]]>