OpenAI in a legal labyrinth: lingering lawsuits and uncertainty in the copyright field.
By Mariel Chichisola and Florencia Gutiérrez
On June 28th, 2023, writers Mona Awad and Paul Tremblay filed a lawsuit in San Francisco Federal Court (Case 3:23-cv-03223) alleging copyright infringement by ChatGPT. They argued that the AI creates “highly accurate summaries” of their works, and point to such use as unauthorized use of their intellectual property.
According to the plaintiffs, OpenAI allegedly used the books “Bunny and 13 Ways of Looking at a Fat Girl” and “The Cabin at the End of the World” to train ChatGPT. According to the legal document, OpenAI would be “unfairly” benefiting from results obtained from its literary works.
The lawsuit seeks to challenge the legality of OpenAI’s program on the grounds that OpenAI used Paul Tremblay and Mona Awad’s books as part of its training data set. Sample summaries are attached as exhibits to the lawsuit.
The complaint focuses on a form of indirect profiteering, rather than direct plagiarism. Despite the difficulty of tracing the origin of the data in the vast internet environment, in certain literary circles, the legal action is seen as a breakthrough in strengthening copyright protection.
Although OpenAI has not openly and transparently disclosed what kind of data and texts it has used to train ChatGPT, it mentioned that they come from the internet, including sources such as Wikipedia. The defendant refers in particular to Project Gutenberg, a database opens to the public with more than 60,000 titles, whose copyrights have already expired.
The suit by the authors – Tremblay and Awad – represents the beginning of a class action lawsuit that could affect all persons or entities in the United States that own copyrights to works used as training data for OpenAI language models. While not explicitly charging plagiarism, the lawsuit raises a focus and interesting questions about the benefits gained by these types of technologies at the expense of copyright.
Experts in the field anticipate that there will be more lawsuits as AI becomes more adept at using information from the web to generate new content. This raises constant questions about the sources used for AI-generated content and big questions: does this use constitute plagiarism, and can it be considered indirect profit? The Tremblay and Awad lawsuits could shed some light on these issues, which for the time being are navigating legal loopholes.
Bibliography:
https://www.lavanguardia.com/tecnologia/20230819/9169702/ia-deja-aire-derechos-autor-fotografia.html
https://www.theguardian.com/books/2023/jul/05/authors-file-a-lawsuit-against-openai-for-unlawfully-ingesting-their-books