The Copyright War: AI Companies vs Content Creators

Training data lawsuits, the No AI FRAUD Act, and the battle over who owns AI outputs

The fight over AI training data and copyright is intensifying, with major lawsuits from the New York Times and music labels, the No AI FRAUD Act in Congress, and billions of dollars at stake.

The legal and legislative battle over AI and copyright has become one of the most consequential policy fights in the technology sector. At its core is a deceptively simple question: can AI companies use copyrighted material to train their models without permission or payment? The answer will reshape the economics of both the AI industry and the creative economy, and the fight is playing out simultaneously in courtrooms, Congress, and regulatory agencies.

The lawsuits are piling up. The New York Times' suit against OpenAI and Microsoft, filed in December 2023, alleges that GPT models were trained on millions of Times articles without authorization. Getty Images sued Stability AI over the use of 12 million copyrighted photographs to train Stable Diffusion. The Authors Guild represents thousands of writers in a class action against OpenAI. Music labels Universal, Sony, and Warner sued AI music generators Suno and Udio for training on copyrighted recordings. As of early 2026, at least 23 major copyright lawsuits against AI companies are pending in federal courts.

In Congress, the No AI FRAUD Act (Nurture Originals, Foster Art, and Keep Entertainment Safe Act) has emerged as the leading legislative response. The bill would establish a federal right to control the use of one's voice and likeness in AI-generated content, create penalties for unauthorized AI replicas of real people, and — most controversially — require AI companies to maintain records of copyrighted works used in training data. The bill has bipartisan support but faces intense lobbying from the AI industry. See our federal policy tracker for the latest status.

The AI industry's legal defense rests primarily on fair use — the copyright doctrine that permits limited use of copyrighted material for transformative purposes without permission. Companies argue that training an AI model on copyrighted text or images is analogous to a human reading books or studying art: the model learns patterns and concepts, not specific works. OpenAI, Google, and Meta have all made this argument in court filings and policy statements. The counterargument from creators is that AI models can and do reproduce substantial portions of copyrighted works, and that the "transformative" use doctrine was never intended to cover industrial-scale copying for commercial AI systems.

Lobbying on copyright AI bills has been heavy on both sides. The AI industry — including OpenAI, Google, Meta, and Microsoft — spent a combined $14.2 million lobbying on copyright-related AI provisions in 2025. On the other side, the creative industries — including the RIAA, MPAA, News/Media Alliance, and Authors Guild — spent $9.8 million. The Recording Industry Association of America has been particularly aggressive, framing AI training on copyrighted music as "the largest act of copyright infringement in history."

The Copyright Office has attempted to provide clarity, issuing guidance in 2024 that AI-generated works are generally not copyrightable but leaving the training data question unresolved. The office is conducting a comprehensive study expected in late 2026. Meanwhile, the EU's AI Act includes a limited text and data mining exception for AI training, with an opt-out mechanism for rights holders — a framework some U.S. lawmakers are studying as a potential model. The outcome of the copyright war will determine not just who pays for AI training data, but the fundamental business model of the generative AI industry.

The Copyright War: AI Companies vs Content Creators

Related Articles

Explore Tools