Another day, another artificial intelligence model. Alibaba Cloud—subsidiary of Chinese conglomerate Alibaba Group and one of the world’s largest cloud computing companies—has unveiled its I2VGen-XL AI tool. It’s an advanced text-to-video system that’s intended to compete against top-of-the-line models like the ones released by Pika Labs or Stability AI.
The company announced the release of the model’s weights today after publishing the model’s research paper last month.
I2VGen-XL is engineered using cascaded diffusion models, the paper explains, a sophisticated AI technique that ensures the generated videos are not only visually impressive but also contextually coherent and semantically accurate. It operates on a two-stage process: the base stage focuses on maintaining coherence with the input text and images, and the refinement stage enhances the details and resolution of the video, achieving up to 1280×720 pixels.
This technique may sound similar to those used to generate images with SDXL. Unlike SD 1.5 and SD 2.1 which relied on a single model, Stability AI developed two different models, a base and a refiner, which should be combined to generate the best quality images possible.
Alibaba Cloud says the model’s training utilized an extensive dataset of around 35 million text-to-video pairs and a staggering 6 billion text-to-image pairs. Such a vast dataset ensures the mod
Go to Source to See Full Article
Author: Jose Antonio Lanz
Tip BTC Newswire with Cryptocurrency