The Dictionary Takes On OpenAI
The Dictionary Takes On OpenAI

Encyclopaedia Britannica, the publisher behind Merriam-Webster, has launched legal action against OpenAI, accusing the artificial intelligence firm of widespread copyright violations. The lawsuit claims the technology company built its powerful language models using copyrighted material without consent.

At the centre of the dispute sits a vast digital archive. Encyclopaedia Britannica holds copyright over nearly 100,000 online articles. According to the complaint, OpenAI scraped this material and incorporated it into the training process for its large language models, including systems used in ChatGPT.

For publishers, the issue echoes a familiar concern: control over intellectual property in the age of automation. A company invests years developing authoritative content, then discovers that a machine may have absorbed it in seconds. What happens when the machine begins answering the same questions readers once searched for on the publisher’s site?

The lawsuit argues that OpenAI crosses another legal line when its models produce responses containing “full or partial verbatim reproductions” of Britannica’s work. Britannica also claims that its content appears inside ChatGPT’s retrieval-augmented generation system, often referred to as RAG.

RAG functions as a live information pipeline. When a user asks a question, the system scans external databases or the wider web to gather updated information before composing an answer. Britannica argues that incorporating its articles in this process occurs without permission.

The publisher raises an additional claim under the Lanham Act. Britannica contends that when ChatGPT generates inaccurate information and attributes it to the publisher, the system creates reputational damage through false attribution.

The complaint puts the issue in stark commercial terms:

“ChatGPT starves web publishers like [Britannica] of revenue by generating responses to users’ queries that substitute, and directly compete with, the content from publishers like [Britannica],”

Executives at Britannica argue the consequences extend beyond revenue. The lawsuit continues:

“the public’s continued access to high-quality and trustworthy online information.”

The claim touches a deeper question facing the digital economy. If AI systems deliver answers directly, readers may never visit the sites that created the information in the first place. Consider the parallel in everyday work: imagine building a detailed report only to discover a colleague summarised it, distributed the summary, and removed the need for anyone to read your original analysis.

Britannica’s lawsuit forms part of a widening legal front against OpenAI. Major publishers and media organisations have begun testing how copyright law applies to artificial intelligence training.

Recent plaintiffs include:

  • The New York Times
  • Ziff Davis, owner of outlets such as Mashable, CNET, IGN and PCMag
  • Newspapers including the Chicago Tribune, Denver Post and Sun Sentinel
  • Canadian outlets such as the Toronto Star and the Canadian Broadcasting Corporation

Britannica has also pursued separate litigation against Perplexity AI. That case remains unresolved.

Courts have not yet produced a definitive ruling on whether training large language models on copyrighted text violates intellectual property law. Judges face a fundamental question: does machine learning simply copy content, or does it transform it into something new?

A recent case involving Anthropic offers a glimpse of how courts may approach the issue. In that dispute, U.S. federal judge William Alsup accepted the argument that using written material as training data could qualify as a transformative use.

Yet the same ruling drew a line elsewhere. Alsup determined that Anthropic broke the law when it downloaded millions of books illegally rather than purchasing legitimate copies. That decision triggered a $1.5 billion class-action settlement benefiting affected authors.

The legal battles now unfolding may shape the future of generative AI. If courts decide that training data requires permission and payment, technology firms may need to negotiate large licensing agreements with publishers.

Another possibility looms. If judges conclude the practice counts as transformative use, publishers may struggle to control how their work fuels the next generation of AI systems.

The outcome will ripple far beyond dictionaries and encyclopaedias. Every industry that produces original content — journalism, academic publishing, entertainment, education — now watches closely.

One question hangs over the entire debate: if artificial intelligence learns from humanity’s knowledge, who owns the value it creates next?

Author: George Nathan Dulnuan

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *