In a landmark ruling with wide-reaching implications for the generative AI industry, Judge William Alsup of the U.S. District Court for the Northern District of California has ruled that AI company Anthropic’s digitization of legally purchased books for training purposes qualifies as fair use under U.S. copyright law.
However, the court drew a hard line on the company’s use of pirated material, calling it an unjustifiable act outside the protections of fair use.
This ruling could set a powerful precedent in the ongoing global debate about how large language models (LLMs) source their training data, especially as lawsuits from authors, artists, and publishers continue to pile up against top AI firms.
The lawsuit, part of a class-action case brought by several authors, revealed that Anthropic employed two data collection strategies to feed its Claude model:
- Purchased Books: The company reportedly spent millions of dollars buying physical books, which were disassembled, scanned, and digitized to build an internal research library.
- Pirated Content: In 2021 and 2022, Anthropic downloaded more than 7 million pirated books—first from Library Genesis and later from the Pirate Library Mirror. Internal records showed cofounder Ben Mann and CEO Dario Amodei were aware of the pirated nature of the files.
Alsup distinguished between these two actions. While he acknowledged that the digitization of purchased books for internal machine learning purposes was transformative and legally protected under the doctrine of fair use, he condemned the piracy of copyrighted material, regardless of its intended use.
“Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy,” Alsup wrote.
“Anthropic had no entitlement to use pirated copies for its central library.”
Implications for AI and Copyright Law
Judge Alsup’s ruling is one of the first to address the legal gray areas surrounding AI model training and copyrighted material. With the industry racing to improve model performance through ever-larger datasets, the legality of using copyrighted works—particularly without permission or payment—has become a flashpoint.
Alsup ruled that the use of purchased books was “exceedingly transformative,” noting that Claude’s training process aims to synthesize new text rather than replicate the original content.
“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them—but to turn a hard corner and create something different,” he wrote.
An Anthropic spokesperson welcomed the decision, stating that it affirms the company’s commitment to responsible innovation.
Why It Matters in a Nigerian Context
As Nigerian developers and startups begin to explore AI model training and fine-tuning, this case serves as a key reference. Startups sourcing Nigerian literary or academic content must take extra care to ensure the material is legally obtained.
AI is becoming a central focus for tech innovation in Nigeria, but the country’s legal and creative frameworks are still catching up. As the AI space matures, respect for intellectual property will be crucial—not only to avoid litigation, but to foster sustainable partnerships with creators, publishers, and regulators.
The ruling comes at a critical time for AI developers, many of whom have operated in a legal gray zone while scraping massive datasets from the internet. With lawsuits now targeting OpenAI, Midjourney, and other firms for using protected works without permission, courts are beginning to define the legal boundaries of training AI.
While Alsup’s decision may offer some reassurance to companies using legally acquired materials, it also puts them on notice: sourcing pirated data is not defensible, even in the name of innovation.
Talking Points
Anthropic’s case demonstrates that not all training data is created equal—and neither is its legal standing. While the court upheld the transformative use of lawfully acquired books, it rejected the notion that innovation excuses theft.
This ruling doesn’t end the debate over how AI companies train their models. But it does offer a clearer path forward—one where creativity, compliance, and ethics must all coexist.
Let’s call it what it is: downloading millions of pirated books while knowing full well they’re stolen isn’t a gray area—it’s a digital heist dressed in a lab coat. This wasn’t accidental; it was deliberate. And if AI giants like Anthropic can justify piracy under the veil of innovation, what’s stopping others from stealing African data, culture, or literature next?
African policymakers and creatives must wake up to the reality that our stories, languages, literature, and cultural archives could be ingested into AI models without our consent. If Anthropic can raid the digital shelves of Western libraries, what’s stopping the next Claude or GPT from scraping entire African archives and tagging it as “open source”?