November 6

This is me speaking Arabic like a native, thanks to AI.

In this video, John Jeffay is speaking Japanese like a native, thanks to AI. Courtesy D-ID  

I uploaded a video from my phone and within minutes it had cloned my voice, translated my words, and lip-synced them to a life-like moving image of me. I also did versions in French and Japanese.

This is the latest in a series of technological advances from Israeli startup D-ID, which uses generative AI to create “photorealistic digital humans.”

It enables anyone creating online content – from a tutorial on how to fix your washing machine to a professional marketing video – to break through language barriers and reach a global audience.

I found the Video Translate software very easy to use and the results are remarkably lifelike. The translation is largely accurate, but there are moments when the AI hasn’t quite caught my drift.

What I actually said in my 30-second video was: “I don’t speak French, or Arabic, or Japanese.”  

That came out, in the French version, as “Je ne parle [made-up word] ni arabe, ni [repeats the made-up word].” It then throws in a random “bibliothèque” (library) which I definitely didn’t say.

We all know that AI is great, except when it isn’t. There’s no way currently to iron out such wrinkles, says D-ID, short of rerecording. But the company is working on an upgrade that will allow you to check and amend the translation before it produces the video.

This little glitch doesn’t take away from the fact that it’s extremely clever technology.

Better than subtitles or dubbing

“The idea is to let you create videos in languages you don’t speak,” Ron Friedman, head of content and creative marketing at D-ID tells ISRAEL21c, “by uploading a single video and automatically translating it, cloning your voice to make an exact replica and matching the lip-sync and the face movements in the original video.

“The result is a video of you speaking a wide range of languages that you don’t necessarily speak.”

It’s aimed both at personal users – to send a birthday greeting in any of the 30 languages available – as well as corporate clients, to target many countries with a single campaign.

The simplest way to translate a video has always been subtitles – not too appealing – followed by dubbing – ditto – or recording multiple presenters in multiple languages, which can be very costly.

D-ID’s solution trumps them all. It opens up new markets for business users by providing quick, efficient, affordable and convincing versions of one video in many languages.

So how convincing is it? “I think that as with every AI feature, some people — those who look more closely into every detail — might notice [that it’s AI-generated],” says Tal Ron-Pereg, product director at D-ID.

“But for a regular audience looking at the footage, most of them won’t know it’s AI and there are tips and tricks how to make it look even better. Some angles will look better than others, but mostly it will look very, very real.”

Gets better over time

As with all AI, the AI that powers Video Translate learns and improves. 

“Over time the translation and lip-sync capabilities get better,” says Ron-Pereg.

“We will also add additional capabilities that will allow you to review before submission – reading out the translation before you actually generate the video, so you can edit a specific word that you think we’ve translated wrong.

“We will also add a capability for the user to give instructions — for example, the pronunciation of certain words or giving a bit more details about what the video is about and who you are addressing, so the tone and style will fit better.”

In time, Video Translate will be able to handle multiple speakers (at the moment it gets confused by more than one person and can end up cloning two voices into one. 

Additional languages will be available, too. Right now, for instance, Video Translate understands Hebrew as an input language but doesn’t yet offer it for output.

De-identification

D-ID started life in 2017 as a company pioneering de-identification (D-ID) – which means fooling facial recognition technology.

Founders Gil Perry (CEO), Sela Blundheim (COO) and Eliran Kota (CTO) were all veterans of the Israel Defense Forces’ elite 8200 signal intelligence unit.

They modified people’s photos so they were still recognizable to humans but couldn’t be identified by facial recognition algorithms.

The company, based in Tel Aviv, moved on to creating avatars – lifelike digital humans – that can engage in real-time natural language conversations, narrate videos instead of actors and more. It has attracted $48 million in funding, has 90 staff, and a very exciting future, says Friedman.

“First we interacted with computers by typing green text commands onto a black screen. Then came the graphic user interface [GUI] – the mouse, drag-and-drop, scrolling functions that we use today.

“We believe that the future is NUI [natural user interface], in which you’re interacting with your device – your laptop, your phone, your fridge, your car, anything – in a natural way, which means face-to-face conversations.

“Instead of going to a website and scrolling through it, you’ll have a conversation with that website’s avatar. You’ll be able to ask it how to solve a problem with a vehicle. 

“You’ll be able top ask it to open a bank account or order a doctor’s appointment, all through a very natural conversation with an avatar that can see you, that can hear you.

“It recognizes your tone of voice and your body language and it responds to you in the most natural stimulating way.”

For more information, click here

More on AI

Fighting for Israel's truth

We cover what makes life in Israel so special — it's people. A non-profit organization, ISRAEL21c's team of journalists are committed to telling stories that humanize Israelis and show their positive impact on our world. You can bring these stories to life by making a donation of $6/month. 

Jason Harris

Jason Harris

Executive Director

More on D-ID