The Copyright War: AI Companies vs Content Creators

From the New York Times to Sarah Silverman, creators are suing AI companies over training data — and the outcome could reshape intellectual property law for a generation

A wave of copyright lawsuits — including the NYT's billion-dollar case against OpenAI — is forcing courts to decide whether AI training on copyrighted works constitutes fair use. Meanwhile, companies are racing to sign licensing deals, and states are passing their own protections.

On December 27, 2023, the New York Times filed what may become the most consequential intellectual property lawsuit of the 21st century. The newspaper sued OpenAI and Microsoft in the Southern District of New York, alleging that millions of its copyrighted articles were used — without permission or payment — to train the large language models powering ChatGPT and Microsoft Copilot. The Times is seeking billions of dollars in statutory and actual damages, claiming the defendants "used The Times's content to create artificial intelligence products that compete with and threaten to divert audiences away from The Times."

The lawsuit didn't emerge from nowhere. According to reporting by the Times itself, the newspaper spent months in licensing negotiations with OpenAI before filing suit. OpenAI reportedly offered $1 million to $5 million per year for a licensing deal — a figure the Times deemed insultingly low given the scope of content ingested and the value it contributed to ChatGPT's capabilities. When talks collapsed, the Times chose litigation.

But the Times case is just the tip of the iceberg. A sprawling constellation of lawsuits, legislative proposals, and corporate licensing deals is reshaping the relationship between AI companies and the creators whose work fuels their models. The outcome will determine whether AI training on copyrighted content is the defining fair use case of the digital age — or the largest act of mass copyright infringement in history.

The Lawsuit Landscape

The legal offensive against AI companies has come from nearly every corner of the creative economy:

Getty Images v. Stability AI (filed January 2023 in the UK High Court and February 2023 in US federal court): The stock photo giant alleges Stability AI copied 12 million images from Getty's library to train Stable Diffusion. Getty's complaint includes striking examples of AI-generated images that reproduce Getty's distinctive watermark — strong evidence, Getty argues, that its copyrighted images were directly ingested. The case is proceeding in both the UK and Delaware.
Authors Guild v. OpenAI (filed September 2023): A class action on behalf of thousands of authors — including John Grisham, Jodi Picoult, George R.R. Martin, and Jonathan Franzen — alleging that OpenAI trained GPT models on pirated copies of their books obtained from shadow libraries like Library Genesis and Z-Library. The complaint cites ChatGPT's ability to produce detailed summaries and passages closely resembling the plaintiffs' published works.
Silverman v. Meta Platforms (filed July 2023): Comedian Sarah Silverman and authors Christopher Golden and Richard Kadrey sued Meta, alleging that LLaMA was trained on their copyrighted books obtained from pirate sites. A federal judge dismissed some claims in March 2024 but allowed the core copyright infringement allegation to proceed.
Concord Music v. Anthropic (filed October 2023): Major music publishers sued Anthropic for allegedly reproducing copyrighted song lyrics in Claude's outputs. The complaint includes examples of Claude generating near-verbatim lyrics to hundreds of copyrighted songs.
Visual artists class actions: Multiple suits filed by artists including Sarah Andersen, Kelly McKernan, and Karla Ortiz against Stability AI, Midjourney, and DeviantArt, challenging image generation models trained on billions of copyrighted artworks scraped from the internet.

As of early 2026, more than two dozen major AI copyright lawsuits are pending in US federal courts alone, with additional cases proceeding in the UK, EU, and Japan. No case has yet reached a definitive appellate ruling on the core question: does training an AI model on copyrighted works constitute fair use?

The Fair Use Battleground

The legal argument at the center of every AI training case is the fair use doctrine — the provision in US copyright law (17 U.S.C. § 107) that permits limited use of copyrighted material without permission for purposes such as criticism, commentary, education, and research. Courts evaluate fair use under four factors:

Purpose and character of the use: Is the use "transformative" — does it add new meaning, expression, or message? AI companies argue that training a model on text or images is fundamentally transformative because the model doesn't store or reproduce the original works; it learns statistical patterns. Critics counter that when ChatGPT or Stable Diffusion produces outputs that closely resemble copyrighted works, the "transformation" argument collapses.
Nature of the copyrighted work: Creative works (novels, art, music) receive stronger protection than factual compilations. Most AI training data includes highly creative works, which weighs against fair use.
Amount and substantiality: AI companies ingested entire works — complete books, full-resolution images, whole articles — not excerpts. Courts have historically viewed copying of entire works skeptically under fair use analysis.
Market effect: Perhaps the strongest argument for creators. If AI-generated content substitutes for the original works — replacing freelance writers, stock photographers, illustrators — the market harm is direct and quantifiable. The Times argues that ChatGPT answers that reproduce its journalism directly cannibalize its subscription revenue.

AI companies point to the Supreme Court's Google v. Oracle (2021) decision, which found that Google's copying of Java API declarations for Android was fair use because it was transformative. But legal scholars note critical differences: Google copied functional code interfaces, not creative expression, and the amount copied was a tiny fraction of the whole. AI training involves wholesale ingestion of creative works.

The closest precedent may be Authors Guild v. Google (2015), where the Second Circuit ruled that Google Books' scanning and indexing of millions of books was fair use because the search snippets it displayed were transformative and didn't substitute for the books themselves. AI companies cite this case aggressively — but Google Books never generated competing text. ChatGPT does.

The Opt-Out vs. Opt-In Debate

While courts adjudicate the legality of past training, a parallel battle rages over the rules for future data collection. The core question: should creators have to actively prevent AI companies from using their work (opt-out), or should AI companies need explicit permission (opt-in)?

The current system overwhelmingly favors AI companies. Web crawlers used to build training datasets typically follow (or ignore) the robots.txt protocol — a decades-old standard that was designed for search engine indexing, not AI training. In 2023, OpenAI introduced its GPTBot crawler and published instructions for website operators to block it via robots.txt. Google followed with Google-Extended. But compliance is voluntary, retroactive blocking doesn't remove already-collected data, and many website operators — especially individual creators — have no idea these crawlers exist.

Critics argue the opt-out model is fundamentally unfair:

It places the burden on millions of individual creators rather than on a handful of well-resourced AI companies
Robots.txt was never designed for this purpose and provides no legal enforcement mechanism
Data already scraped before a block is added remains in training datasets
Many creators lack the technical knowledge to implement blocks
New crawlers emerge constantly — blocking one doesn't block others

The AI training exclusion header — a proposed HTTP response header that signals "do not use this content for AI training" — has gained traction among web standards bodies but remains voluntary and unenforceable without legislation.

Content creators and publishers increasingly demand an opt-in regime: AI companies should license content before training, not scrape first and negotiate later. This is the model that OpenAI has reluctantly begun adopting through its licensing deals.

The Licensing Gold Rush

Even as it fights the Times in court, OpenAI has moved aggressively to secure licensing agreements with major publishers — a strategy that simultaneously builds a "clean" training data pipeline and undercuts the legal argument that licensing is impractical:

Associated Press: Licensing deal announced July 2023, granting OpenAI access to AP's news archive. Financial terms undisclosed but reported to be in the low millions annually.
Axel Springer (Politico, Business Insider): Licensing deal announced December 2023, reportedly worth tens of millions of dollars per year. Includes integration of Axel Springer content into ChatGPT responses with attribution and links.
Reddit: A blockbuster $60 million per year deal announced February 2024, giving OpenAI access to Reddit's vast archive of user-generated content. The deal was disclosed in Reddit's IPO filing and drew criticism from Reddit users who argued their posts were being sold without consent.
Le Monde and Prisa Media: European publisher deals announced in 2024, giving OpenAI access to French and Spanish-language news content.
Vox Media, The Atlantic, Time: Additional US media deals signed throughout 2024-2025, each reportedly in the range of $5-10 million annually.

These deals create a two-tier system that troubles many observers. Major publishers with legal resources and negotiating leverage can extract significant payments. Individual creators — freelance writers, independent artists, small bloggers — have no practical way to negotiate and no meaningful recourse if their work is used. The licensing model may protect the New York Times while leaving the vast majority of creators uncompensated.

Adobe has taken a different approach with its Content Credentials initiative (part of the Content Authenticity Initiative). Content Credentials embed cryptographic metadata into images and documents that can signal whether the creator permits AI training. Adobe trained its Firefly image generation model exclusively on licensed Adobe Stock images, content in the public domain, and openly licensed content — allowing it to offer legal indemnification to commercial users. This "clean data" approach has become a competitive differentiator, though critics note it limits the model's capabilities compared to competitors trained on the open web.

State Legislative Responses

With federal legislation stalled, states have begun enacting their own protections — creating a patchwork of laws that AI companies must navigate:

Tennessee's ELVIS Act (Ensuring Likeness Voice and Image Security Act), signed into law in March 2024, was the first state law specifically addressing AI and creative rights. The law protects musicians and performers by making it illegal to use AI to clone a person's voice without permission. Named with a nod to Tennessee's most famous musical export, the ELVIS Act was championed by the music industry and reflects Nashville's outsized influence in state politics. Violations can result in both civil liability and criminal penalties.

The ELVIS Act has become a model for other states. As of early 2026:

California has introduced multiple bills addressing AI and creative rights, including proposals to require AI companies to disclose training data sources and to create a right of action for creators whose works are used without consent. California's AB 2602 and AB 1836 (both signed in 2024) protect performers against unauthorized AI replicas of their voice and likeness.
New York has proposed the No Fakes Act at the state level, mirroring federal legislation that would create a federal right protecting individuals against unauthorized AI replicas.
Illinois, building on its Biometric Information Privacy Act (BIPA) precedent, has introduced bills extending biometric protections to AI-generated voice and facial replicas.
More than 15 states introduced AI-and-copyright related bills in 2025-2026 legislative sessions.

The EU's Transparency Approach

The EU AI Act, which entered into force in August 2024 with phased implementation through 2027, takes a notably different approach to the training data question. Rather than resolving the fair use/copyright question directly, the AI Act imposes transparency obligations on providers of general-purpose AI (GPAI) models:

GPAI providers must publish a "sufficiently detailed summary" of the training data used, following a template developed by the EU AI Office
Providers must have a policy to comply with EU copyright law, including the text and data mining opt-out provisions of the 2019 EU Copyright Directive
The EU Copyright Directive already provides that rights holders can reserve their rights against text and data mining — effectively creating an opt-out regime with legal teeth
Penalties for non-compliance can reach €15 million or 3% of global annual turnover

The EU approach sidesteps the American fair use debate entirely. In the EU, the question isn't whether AI training is "fair" — it's whether rights holders have opted out of text and data mining, and whether AI companies have respected those opt-outs. This creates a clearer legal framework but also a more restrictive one for AI development.

What Happens Next

The copyright war is approaching several inflection points in 2026:

The NYT v. OpenAI case is likely to produce significant rulings on discovery and potentially summary judgment motions. OpenAI has argued that the Times manipulated ChatGPT prompts to produce near-verbatim reproductions that aren't representative of normal use. The Times counters that the ability to extract its content at all proves the infringement. A ruling on fair use in this case — likely from the Southern District of New York — would set a powerful precedent even before any appellate review.

Congressional action remains possible but uncertain. The No Fakes Act (S. 3696 in the 118th Congress) — which would create a federal right protecting voice and likeness against unauthorized AI replicas — has bipartisan support and backing from SAG-AFTRA and the music industry. The AI Disclosure Act and various transparency proposals could also move, particularly if courts don't provide clear answers quickly enough for Congress's liking.

The licensing market is maturing rapidly. As more publishers sign deals with AI companies, a market price for training data is emerging. This could eventually make the fair use question less relevant — if licensing becomes standard practice, the legal battles over past training may settle while future training operates under negotiated agreements. But this leaves individual creators in the cold unless collective licensing mechanisms (similar to music performance rights organizations like ASCAP and BMI) emerge for AI training.

The fundamental tension remains unresolved: AI companies built trillion-dollar capabilities on the creative output of millions of people who were never asked and never compensated. Whether courts call that fair use or infringement, whether legislatures create new rights or defer to existing law, the copyright war is really a question about the value of human creativity in an age of machine intelligence — and who captures the economic gains when machines learn from human work.

We're tracking every lawsuit, licensing deal, and legislative proposal on our Bill Tracker and Follow the Money pages. The copyright war is far from over — and the stakes couldn't be higher.