In brief
- Authors E. Molly Tanzer and Jennifer Gilmore have sued Salesforce, alleging it “pirated hundreds of thousands of copyrighted books” to develop its XGen AI models.
- The lawsuit claims Salesforce initially disclosed using the “RedPajama-Books” dataset in June 2023, then deleted references two months later, rebranding training data as simply “publicly available.”
- Salesforce CEO Marc Benioff previously said AI companies “ripped off” training data and “all the training data has been stolen,” in an interview with Bloomberg.
A new class action lawsuit in San Francisco federal court has accused software giant Salesforce of building its XGen AI models on a pirated library of books and then scrubbing references to those sources once questions arose.
Filed on Wednesday by authors E. Molly Tanzer and Jennifer Gilmore, the suit is brought under the Copyright Act, alleging ongoing infringement, saying Salesforce “continues to do so by continuing to store, copy, use, and process the datasets containing copies of Plaintiffs’ … copyrighted books.”
The complaint says Salesforce.INC “pirated hundreds of thousands of copyrighted books to develop its XGen series of large language models,” relying on the “notorious RedPajama and The Pile datas
Go to Source to See Full Article
Author: Vismaya V
