Encyclopedia Britannica Sues OpenAI Over Alleged Training Data Copyright Infringement

Encyclopedia Britannica has filed a lawsuit against OpenAI in federal court, alleging that the AI company used its copyrighted educational content to train large language models without permission or compensation.

The complaint, filed in the Southern District of New York, claims OpenAI scraped and ingested vast amounts of Britannica's authoritative reference material, including encyclopedia entries, educational resources, and expert-written articles. Britannica argues this constitutes willful copyright infringement at a massive scale.

"For 250 years, Britannica has invested in creating accurate, expert-verified content," said Jorge Cauz, Britannica's CEO. "We cannot allow our intellectual property to be used without authorization to train AI systems that compete with our core business."

The lawsuit seeks both monetary damages and injunctive relief to prevent further unauthorized use. Britannica is also requesting that OpenAI disclose what specific content was used in training its models.

This case joins a growing wave of copyright litigation against AI companies. The New York Times, authors including Sarah Silverman and Michael Chabon, Getty Images, and music publishers have all filed similar lawsuits arguing that training AI models on copyrighted works without licensing agreements constitutes infringement.

OpenAI has argued in previous cases that its use of training data constitutes fair use under copyright law, as the models create transformative works. However, courts have yet to definitively rule on this question.

The outcome of these cases could fundamentally reshape how AI companies acquire training data and whether they must compensate original content creators.