Publisher 'Chicken Soup for the Soul' Sues Apple, Google, OpenAI Over AI Training
Publisher Chicken Soup for the Soul Enterprises has filed a lawsuit in California federal court, accusing eight major tech companies, including Apple, Google, and OpenAI, of illegally using copyrighted books to train their AI models. The lawsuit alleges the companies sourced material from shadow library websites without permission.
SAN FRANCISCO — Chicken Soup for the Soul Enterprises, the publisher behind the popular inspirational book series, has launched a major copyright lawsuit against eight leading technology firms, alleging the illegal use of its books and other copyrighted works to train artificial intelligence systems.
The lawsuit, filed Tuesday in a California federal court, names a who's who of the AI industry as defendants: Apple, Google, Nvidia, Meta, OpenAI, Anthropic, Perplexity AI, and Elon Musk's xAI.
Core Allegations: Training on 'Shadow Libraries'
The publisher's central claim is that these companies systematically copied vast quantities of protected literary works without authorization. According to the complaint, the firms sourced these materials from so-called 'shadow library' websites known for hosting pirated content.
Specifically cited are datasets and sources including 'The Pile', LibGen, Z-Library, and Anna's Archive. The lawsuit alleges the defendants downloaded pirated copies from these sites, then parsed and embedded the text into their large language models (LLMs).
"This conduct constitutes clear copyright theft," the filing states, arguing it was done to "accelerate commercial development and win the generative AI race."
Apple in the Spotlight
The complaint singles out Apple for particular scrutiny. It claims 'Apple Foundation Models' relied on the 'The Pile' and 'Books3' datasets during training. These datasets are alleged to contain works from bestselling authors and Pulitzer Prize winners, used to refine Apple's AI products without compensation to the creators.
This is not the first time Apple has faced scrutiny over the 'The Pile' dataset. In a 2024 case involving AI training on YouTube videos, the dataset emerged as a point of contention. At that time, Apple stated the dataset was used solely for open-source research purposes and was "absolutely not" used to power its 'Apple Intelligence' system or any consumer-facing machine learning features.
The current lawsuit represents a significant escalation in the ongoing legal battles over the data used to train generative AI. It places a mainstream, well-known publisher directly against the largest players in the tech industry, setting the stage for a potentially landmark case on copyright in the AI era.
As of publication, none of the named defendants have issued public statements regarding the new lawsuit.