FP employeesOct 10, 2022 5:49:10 PM IST
Just days after Meta announced its text-to-video generator, Google has announced that it’s almost ready to announce its own AI-powered text-to-video generator, which they call Google Imagen Video.
The generator is still in development, but when it reaches a public-facing state, it will be able to produce 1280×768 videos at 24 frames per second from a simple written command prompt.
According to Google’s research report, Imagen Video will have stylistic capabilities such as creating videos based on the work of famous artists like Vincent van Gough. It will also create rotating 3D objects while preserving their structure and rendering text in different animation styles.
Google’s new Imagen Video Al turns text descriptions into high-resolution 5.3-second videos🤩🤩🤩 pic.twitter.com/KhvsvGqLFh
— Tansu YEĞEN (@TansuYegen) October 8, 2022
According to Google, Imagen Video was trained with 14 million video-text pairs and 60 million image-text pairs and the LAION image-text dataset used to train Stable Diffusion.
Google hopes its AI video model can “significantly reduce the difficulty of generating quality content.” Imagen Video builds on Google’s Imagen, a text-to-image program similar to OpenAI’s DALL-E.
As described in Google’s research teaching, Imagen Video takes a text description and generates a 16-frame video, three frames per second, and a resolution of 24×48 pixels. The system then upscales and “predicts” additional frames, producing a final video at 128 frames and 24 frames per second at 720p.
— Simon Geisker (@simonfilm_nyc) October 6, 2022
It is worth noting that all Imagen Video results were selected by Google itself and no independent tester has tried the program until now.
However, the research paper claims that Imagen Video can properly render text, something that DALL-E and Stable Diffusion both struggle with. The text these programs generate is barely legible.
Imagen Video is also said to have demonstrated an understanding of depth and three-dimensionality, which allows it to create drone flythrough videos that rotate and capture objects from different angles without distortion.
Google has expressed concern over “problematic data” used to train its AI image generator programs. The company has attempted to filter out sexually explicit or violent content, as well as social stereotypes and cultural bias. It is concerned that the tool could be used “to generate fake, hateful, explicit or harmful content”.
“We have decided not to release the Imagen Video model or its source code until these concerns are resolved,” Google adds.