SAN DIEGO, Calif. – Do you have a 20-minute file of someone talking? If so, you could use a future Adobe product to put words into their mouth, literally, if this year’s Adobe Sneaks is anything to go on.
Developed by Zeyu Jin, a tech intern with Adobe Systems Inc.’s creative technologies division, Project VoCo appears to have captured the most attention among the features showcased by the company’s annual competition, but with good reason: It brings many a paranoiac’s worst fear to life by making fake speech sound extremely convincing.
“We’ve made a lot of breakthroughs in the past decade with photo editing, right?” developer Jin told the media during a pre-competition conference. “So why not do the same with speech?”
To demonstrate Project VoCo’s capabilities, Jin played an audio clip of comedian Keegan-Michael Key (not coincidentally the frequent partner of Jordan Peele, who hosted this year’s Adobe Sneaks event) delivering a joke mid-conversation regarding the day he won an award:
“I jumped out of bed, and I kissed my dogs, and my wife, in that order – ”
Here Jin stopped the recording and demonstrated how, simply by typing into the Project VoCo interface, which displays the language used in the audio file in an easily edited way, he could make Key appear to say:
“…and I kissed my wife, and my wife – ”
But that hadn’t been Jin’s intent. Key doesn’t have two wives; Jin had wanted Key to acknowledge his wife first, then his dogs:
“I kissed my wife, and my dogs.”
Then, to showcase the new feature’s capabilities, Jin replaced the words “wife and my dogs” with something else:
“I kissed Jordan, and my dogs,” Key now apparently said.
Then, simply because he could, Jin changed the message once again:
“…and I kissed Jordan three times – .”
It must be emphasized – and believe us, we’re trying to be objective here – that each time Jin changed what Key was saying the new audio sounded natural, as if Jin had recruited Key to record it so that he could hoodwink some unsuspecting journalists, but no – Jin insisted it really was an electronic version of Key using his words.
When asked how Adobe would address the inevitable security concerns behind such a product, Jin admitted that to an extent he and the other developers were counting on users to refrain from using this sort of feature for nefarious means, noting that he and the feature’s other developers believed it would mainly be useful for recording media such as audiobooks and podcasts.
However, he also said that as the development team strove to improve the feature, making it sound as natural as possible, security would be a leading concern as well.
“We hope that people will have the personal constraint of not using it in a bad way,” Jin said. “At the same time… if [people are] really going to use it in a bad way, we’ll have to find a way to protect the feature and detect its use.”
He also said that to effectively replicate a person’s speaking voice, Project VoCo would need to build a library of phonemes (vocal noises) from a recording of adequate length – around 20 minutes.
Other highlights from this year’s Adobe Sneaks, the Adobe Max conference’s annual showcase of untested, unexpected, and often useful features, included Project Quick Layout, an Illustrator patch that would make objects on a poster automatically accommodate new additions in an attractive way; and Project Clover, a VR video editor that can be used within a VR interface.