Social media giant Meta has introduced its latest artificial intelligence (AI) models for content editing and generation, according to a blog post on Nov. 16.
The company is introducing two AI-powered generative models. The first, Emu Video, leverages Meta’s previous Emu model and is capable of generating video clips based on text and image inputs. The second model, Emu Edit, is focused on image manipulation, promising more precision in image editing.
The models are still in the research stage, but Meta says its initial results show potential use cases for creators, artists and animators alike.

According to Meta’s blog post, the Emu Video was trained with a “factorized” approach, dividing the training process into two steps to allow the model to be responsive to different inputs:
“We’ve split the process into two steps: first, generating images conditioned on a text prompt, and then generating video conditioned on both the text and the generated image. This ‘factorized’ or split approach to video generation lets us train video generation models efficiently.”
The same model can “animate” images based on a text prompt. According to Meta, instead of relying on a “deep cascade of models,” Emu Video only uses two diffusion models to generate 512×512 four-second-long videos at 16 frames per second.
Emu Edit, focused on image manipulation, will allow users to remove or add backgrounds to images, perform color and geometry transformations, as well as local and global editing of images.
“We argue that the primary objective shouldn’t just be about producing a ‘believable’ image. Instead, the model should focus on precisely altering only the pixels relevant to the edit request,” Meta noted, claiming its model is able to precisely follow instructions:
“For instance, when adding the text ‘Aloha!’ to a baseball cap, the cap itself should remain unchanged.”
Meta trained Emu Edit using computer vision tasks with a data set of 10 million synthesized images, each with
Go to Source to See Full Article
Author: Ana Paula Pereira