A client needed one of our courses in another language, and they needed it fast. They had a lot of employees who spoke Spanish, and our Hazard Communication course was only available in English. They did understand that it could take us some time to translate the course, and it wound up taking a little less than a month to do so. But the actual translation? That was done by that afternoon. While we didn’t translate it into more languages, we could have easily translated it into half a dozen languages at the same time.
And we did it all with some of the latest technology.
What we used were both ElevenLabs and Synthesia, and if you make online courses for a living, it’s worth understanding not because of the hype around it, but because of a few specific things it actually changes. I’ll tell you what we found.
The problem we were trying to solve
Before we get into how it works, here’s why it matters to us.
ProTrainings has trained more than three million people in CPR, health, and safety. A good number of those people don’t speak English as a first language. For years, the cost of producing multilingual versions of a course was prohibitive. You’d need to hire translators, find credible instructors in each language, build entirely separate recording sessions, and manage the production pipeline for each one. The math usually didn’t work, as some of the quotes we got for just translation were astronomical.
We eventually worked out a way to get the videos subtitled into a foreign language, using a tool called Rev.com, and we used that a lot. It had real people transcribing our content in whatever language we wanted. It was a great service, but we’ve had to make corrections to their human-produced transcripts. We then found a fantastic person on Upwork to record the audio. He’s someone we plan on continuing to hire for our Spanish dubbing whenever we have a need, and we have the time.
But when there’s a fast-approaching deadline, and we need something quick, that’s where ElevenLabs changed the game. Tools like ElevenLabs can now take an existing audio track, translate it, and synthesize the instructor’s voice speaking fluently in Spanish, Mandarin, French, or Swahili. Not a robotic text-to-speech voice. Not a clearly artificial substitute. A voice that carries the same tone and cadence as the original instructor.
We recorded and edited Roy’s English-language videos the same way we always have. The AI did the rest.
What this actually looks like in production
I want to be honest about what “AI voice” covers, because the term gets used to describe several different things.
The first is voice translation. You record in one language, and the AI produces a translated version in the instructor’s cloned voice. We used Synthesia for our HazCom course. It allowed us to take the translation that they gave us, for each sentence, and correct it. I worked with Ignacio, who is one of our team bilingual team members here at ProTrainings, and he helped me to make the translations accurate. And then Synthesia took it from there, before I went back in and edited the videos by hand to get them ready for launch. That was the part that took the longest, because our videos have a lot of text on the screen as well, so I was editing that to Spanish by hand. But, if your course has a legitimate audience in another language and you haven’t been able to reach them because of production cost, this removes that barrier.
The second is voice synthesis for updates. This one’s more nuanced. The ability to update course audio without reshooting, to fix a single sentence without booking a recording session, to correct a detail that changed after the video was made, is genuinely useful. We’ve used it for minor audio corrections, for cleaning up background noise, and for making small changes that would otherwise require scheduling an instructor’s time. And with the latest update on ElevenLabs, which is version 3.0, the voice cloning is so good I cannot even tell which audio is original, and which is A.I. Sometimes I think something was A.I. and it was the instructor who actually said it. Either way, the student doesn’t notice, so they aren’t distracted while learning, and that’s the point.
The third is AI avatars, which are virtual instructors that can deliver content without a human in front of the camera at all. This can be great for lectures, and we have tried it out a little on our HazCom course to fill in when we needed the video in English and then later in Spanish. However, currently instruction like this doesn’t do more than talk to the camera. For the kind of hands-on skill demonstration that ProTrainings is known for, it cannot replace what a real instructor does. We film close to the action deliberately. We want the student next to the skill, not watching from a distance. An avatar cannot do that. Yet.
What it doesn’t fix
Here’s what I’ve noticed after spending real time with this technology.
It doesn’t replace the foundational decision of why a student should care about what you’re teaching. That work is still ours to do. The best AI voice in the world won’t save a lesson that doesn’t earn the student’s attention in the first thirty seconds.
The clinical accuracy of what’s being said still has to be right. AI voice is an output channel, not a content reviewer. Whatever goes in comes out sounding authoritative. That puts more responsibility on the people writing the scripts, not less.
And while the voice quality has improved significantly, it’s still possible for students to sense something slightly off, especially in longer courses. We watch the course content very carefully to ensure that we’re happy with it first, and make changes whenever we are not. We also watch completion rates. If a course using AI voice starts showing drop-off patterns we don’t see in the original, that tells us something.
The honest version of the future
Online education has always had an access problem. The courses that actually teach important skills, the ones that help people do their jobs and potentially save lives, have historically been available only to the people who speak the right language, live near the right instructors, and can afford the right programs.
AI voice doesn’t solve all of that. But it removes one of the barriers that was, until recently, almost entirely a production cost problem.
We produced a full course in Spanish in an afternoon. This gave millions of potential students, at least theoretically, access to a version they could actually understand. That’s the part that made me take this seriously.
If you make training courses and you haven’t started experimenting with this yet, I’d start with translation. Pick one course. Pick one language you know your students need. See what it takes to produce it. The tools exist. The cost is no longer the excuse it was.
ProTrainings trains people in CPR, health, and safety online and in person. Follow us on LinkedIn to see how we’re using AI to improve our courses.