As AI tools become more commonplace, lawsuits centering around the training data used to create machine learning models are piling up.
Now, OpenAI, the company behind the popular ChatGPT text-generation tool and underlying models, has moved to dismiss most of the claims against it in lawsuits filed by authors who allege that AI outputs infringe on their copyright.
The machine learning models that power ChatGPT, such as GPT-4, are trained on massive amounts of data scraped from the internet. Although OpenAI has not released any information on the training data for GPT-4, the firm has admitted that training data for its earlier GPT-3 model included "internet-based books corpora," meaning a database of books available online. The authors suing OpenAI in two separate lawsuits—Sarah Silverman, Christopher Golden, and Richard Kadrey filed one, and Paul Tremblay and Mona Awad filed another—allege that every output that ChatGPT makes is thus a derivative work of their books and infringes copyright.
OpenAI filed identical motions to dismiss the majority of claims in both lawsuits in a California court on Monday. According to the company, the authors' claims are "defective" and should all be dismissed except for one. The claims that OpenAI has filed to dismiss include vicarious copyright infringement, violating the Digital Millennium Copyright Act, unfair competition, negligence, and unjust enrichment. If the motion is granted, then the cases will center around one claim of direct copyright infringement.
"It is important for these claims to be trimmed from the suit at the outset," the OpenAI motion states, "so that these cases do not proceed to discovery and beyond with legally infirm theories of liability." Discovery is a legal process that forces parties to disclose documents and details about their internal processes if they are relevant to the lawsuit.
The firm's reasons for filing to dismiss are varied, but generally rely on framing the authors' claims as misunderstanding the technology and being overbroad. OpenAI argues that "in only a remote and colloquial sense" are all ChatGPT outputs "based on" any individual author's work, and that considering every output to violate everybody's copyright—even if the output bears no resemblance to the copyrighted work—would be "frivolous" and is "not how copyright law works."
Since the rise of AI tools like ChatGPT and DALL-E, creators have called for restricting or banning the use of AI generators. They point out that the tools enable companies to generate works without paying artists and authors, creating a massive labor shift that has already begun to infiltrate newsrooms, publishing, and the film industry.
“Generative AI art is vampirical, feasting on past generations of artwork even as it sucks the lifeblood from living artists,” reads an open letter signed by more than 3,000 artists and creators opposing the use of AI tools in publishing. “While illustrators’ careers are set to be decimated by generative-AI art, the companies developing the technology are making fortunes. Silicon Valley is betting against the wages of living, breathing artists through its investment in AI.”
OpenAI argues that GPT models use "a staggeringly large series of statistical correlations" to generate text, and that such statistical information including "word frequencies, syntactic patterns, and thematic markers" is not copyrightable. Thus, the claim that any GPT output violates the authors' copyright should be thrown out, OpenAI argues, i.e. the claim of vicarious copyright infringement. As for direct copyright infringement, OpenAI argues that authors have not made a specific claim and that generating a wholesale copy of a work in order to create a new non-infringing product is fair use.
Whether or not the court grants OpenAI's motion to dismiss the majority of claims against it, plaintiffs have demanded a jury trial.
Related posts:
Views: 0