Stable Diffusion, creator of the open-source generative image tool Stability AI, has removed a widely used AI training dataset after researchers found that the data scraper had ingested child sexual abuse material—or CSAM.
The discovery was made by Stanford scientists and reported by 404 Media.
Large language models and AI image generators like Stable Diffusion, Midjourney, and Dall-E rely on massive datasets to train and later generate content. Many of these datasets, like LAION-5B, include images scraped from the internet.
Many of those images depicted harm to minors and are internationally condemned as illegal.
“Many older models, for example, were trained on the manually labeled ImageNet1 corpus, which features 14 million images spanning all types of objects,” Stanford researcher David Thiel wrote. “However, more recent models, such as Stable Diffusion, were trained on the billions of scraped images in the LAION‐5B2 dataset.
“This dataset, being fed by essentially unguided crawling, includes a significant amount of explicit material,” Thiel explained.
Go to Source to See Full Article
Author: Jason Nelson
Tip BTC Newswire with Cryptocurrency