device learning researchers have produced a system that can recreate lifelike motion from just an individual frame of a person’s face, opening up the possibility of animating not just photos but also paintings. It’s not outstanding, but when it works, it is — like much AI work these days — eerie and intriguing.
The version is documented in a paper published by Samsung AI Center, which you can read here on Arxiv. It’s a brand-new mode of applying facial landmarks on a source face — any talking head will do — to the facial data of a target face, making the target face do what the source face does.
This in itself isn’t brand-new — it’s part of the whole synthetic imagery issue confronting the AI world right now (we had an absorbing discussion about this recently at our Robotics + AI event in Berkeley). We can already make a face in one video reflect the face in another in terms of what the person is saying or where they’re looking. But most of these models demand a considerable amount of data, for instance a minute or two of video to analyze.
The brand-new paper by Samsung’s Moscow-based researchers, however, shows that using only an individual graphic of a person’s face, a video can be generated of that face turning, speaking and making ordinary expressions — with convincing, though far from flawless, fidelity.
It does this by frontloading the facial landmark identification process with a gigantic amount of data, making the version highly efficient at finding the parts of the target face that correspond to the source. The more data it has, the good, but it can do it with one graphic — named
individual-shot learning — and get away with it. That’s what makes it viable to take a graphic of Einstein or Marilyn Monroe, or even the Mona Lisa, and make it move and speak like a real person.
It’s also using what’s named
a generative Adversarial Network, which essentially pits two models against one another, one trying to fool the other into reasoning
what it creates is “real.” By these means the results meet a certain stage of realism set by the creators — the “discriminator” version has to be, say, 90% sure this is a mankind face for the process to continue.
In the other examples provided by the researchers, the grade and obviousness of the fake talking head varies widely. Some, which strive to replicate a person whose graphic was taken from cable news, also recreate the news ticker shown at the bottom of the graphic, filling it with gibberish. And the usual smears and mysterious artifacts are omnipresent if you know what to look for.
That said, it’s remarkable that it works as well as it does. Note, however, that this only works on the face and upper torso — you couldn’t make the Mona Lisa snap her fingers or dance. Not yet, anyway.