Microsoft’s VASA-1 Can Generate Realistic Human Videos From Images

Published On: April 18, 2024

JOIN US

Microsoft has unveiled VASA-1, an AI tool that can create videos of human faces directly from still images. It can also synchronize facial expressions when an audio clip is provided. The company has showcased several samples from VASA-1 on its official website, and the results have impressed AI enthusiasts.

Microsoft VASA-1 AI Video Generator

Microsoft’s Visual Affective Skills Audio, or simply VASA-1, is a top-end model from the company specially curated around human facial expressions. It can generate a wide spectrum of feelings and emotions through facial dynamics and involves movements of face muscles, lips, nose, head tilts, and many other factors.

Here are some samples of videos generated from Microsoft VASA-1:

The First AI-Generated Video That Looks Super Real
Microsoft Research announced VASA-1.
It takes a single portrait photo and speech audio and produces a hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements… pic.twitter.com/6bxd4mEgFR
— Bindu Reddy (@bindureddy) April 17, 2024

Microsoft just introduced VASA-1.
It can generate photorealistic talking videos using just one photo and an audio file.
6 wild examples and demo below: pic.twitter.com/z4YIq4jYRx
— Eyisha Zyer 🪐 (@eyishazyer) April 18, 2024

Introducing: VASA-1 by Microsoft Research.
TL;DR: single portrait photo + speech audio = hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements, generated in real time.
Tap to see all the videos. pic.twitter.com/pPC6qZOBW2
— Eduardo Borges (@duborges) April 18, 2024

Currently, VASA-1 can generate videos at a maximum resolution of 512×512 pixels at 40fps. The company says the tool is designed to create videos that are as close as possible to real life.

It is important to note that Microsoft has showcased VASA-1 only as a research demonstration. The company has clarified that it has no plans to release a product or any APIs related to VASA-1. It added that Microsoft won’t release this product publicly, citing vast possibilities of misuse of this technology.

The concept of VASA-1 is similar to Sora by OpenAI. Both tools generate realistic-looking videos using AI. While VASA-1 is focused on human expressions, Sora can create complex videos with contextual backgrounds and artefacts.

However, neither tool has yet been released in the public domain. The official announcements from Microsoft and OpenAI highlight the capabilities and potential applications of VASA-1 and Sora in CGI and realistic AI-generated human avatars.

Google is also working on its AI video generator, VideoPoet. Although the initial samples from VideoPoet are not as good as VASA-1 or Sora, they highlight that even Google is trying to catch up to the AI video generator bandwagon.