Table of Contents
When ambition meets real-world AI limitations
Not every AI-driven initiative ends with a picture-perfect success story. Sometimes the real value lies in what you learn along the way. That’s exactly what happened with our podcast concept, “AI in Production,” and our attempt to distribute it in multiple languages using AI-based translation and voice cloning tools.
This is not a story about a flawless rollout. It’s a case study of an experiment in AI-powered podcast translation—one that revealed both the massive potential of the technology and its very real limitations.
The core challenge: which language should a tech podcast be recorded in?
The idea was simple: we wanted to create a technology podcast focused on real AI implementations in business. The natural next step was inviting guests we had worked with on actual projects. Very quickly, though, we ran into a challenge that’s common in international environments: our guests communicate in different languages.
Choosing one language would inevitably limit the comfort and natural flow of conversation. We wanted guests to speak freely and confidently, without a language barrier getting in the way. That’s when the idea emerged: record each episode in the language that feels most comfortable for the guest, and then use AI to translate it into three other languages. The goal was to end up with full content in Polish, English, German, and French.
The English version was planned as the primary YouTube release, since it’s the most universal language in tech. YouTube itself also supports real-time captions in 165+ languages, which improves accessibility for international viewers.
Tool selection: ElevenLabs
To run this experiment, we used ElevenLabs—an AI platform for voice and audio workflows that enables automatic transcription, translation, and synthetic voice generation in the target language.
The free plan offered very limited capabilities. We could translate only up to two minutes of content, with no editing options, and the credit limits were far too low to work on a full podcast episode in any realistic way.
Upgrading to the Pro plan unlocked Studio, which allowed real-time editing. The platform automatically generated a transcript and translated it into a predefined target language. Importantly, we could manually adjust both the original and translated text if needed. It also handled speaker separation well, as long as we defined the number of speakers in advance. Uploading a file into Studio was expensive in terms of credits, but once credits were paid, we could use them throughout the editing process.

Where the real problems appeared
The first major challenge was translation quality in the context of a natural conversation. A podcast is intentionally informal: people repeat words, correct themselves, change direction mid-sentence, or rephrase on the fly. The model didn’t always handle that well. Some translations felt illogical, and certain fragments required manual corrections to preserve meaning.
An even bigger challenge was voice quality.
The dubbing feature automatically applied voice cloning, but it was also possible to create custom voice models from provided recordings. In our case, the first four episodes were recorded in Polish, meaning the model “learned” primarily from Polish audio—then had to perform on English and German scripts. To improve results, we created additional voice models based on our guest’s recordings in English and German.
The outcome was mixed. The German version performed best, but still fell short of expectations. The generated voice often differed noticeably from the original, with inconsistent tone and accent. At times, the same speaker sounded like two different people across different parts of the episode. In some moments the voice sounded synthetic—almost like a robotic narrator with reduced natural tone and emotion—while in others it became too fast or unnaturally modulated. Since the same sentence can vary in length between languages, the model sometimes tried to “fill the gap” with strange, unnatural audio artifacts. We could regenerate individual lines, but the result was unpredictable, and a new version was rarely better than the previous one.
The tool offered three sliders to adjust voice output: Style, Similarity, and Stability/Clarity (depending on the setup). In practice, changes often felt like pure trial and error. Increasing “Style” usually resulted in a higher-pitched voice and unnatural intonation. Higher “Similarity” didn’t consistently produce a closer match. On top of that, some lines became randomly louder or quieter without a clear reason.
At that point, we had to ask ourselves a hard question: is this technology mature enough for fully professional, multi-language podcast distribution?

Ethics and consent: AI voice usage must be optional
In any voice cloning workflow, consent is non-negotiable. If a guest did not agree to having their voice modified using AI, we fully respected that decision. Guest comfort and personal image matter more to us than any technology experiment.
In such cases, we used traditional subtitles instead. On YouTube and Spotify, we published the original audio together with automatically generated captions. On our website, we provided subtitles in three additional languages. Every episode that included AI-generated voice was clearly labeled “AI Voice,” and we communicated the use of AI openly. Transparency was just as important as innovation.

Would we recommend AI podcast translation?
The answer isn’t black and white—but overall: yes, as long as expectations are realistic. AI-powered podcast translation and dubbing is a huge opportunity to scale content internationally without recording the same material multiple times. At the same time, it’s important to accept that results may be imperfect, and some listeners may notice unnatural intonation, accent issues, or minor translation flaws.
The more these tools are used, the more data they receive and the more quality tends to improve over time. Platforms like ElevenLabs also offer additional audio editing features worth exploring, so testing and iterating is part of the process.
This project wasn’t a flawless success story. But it was a practical experiment that showed what AI can—and can’t—deliver today in real audio production.

Discover “AI in Production” and see AI in real action
Our podcast is for technology professionals, business leaders, and anyone who wants to understand how AI works in real projects and real-world scenarios. These are conversations about implementations, challenges, and hands-on experience—without marketing fluff, but with concrete knowledge and market examples.
We invite you to explore the results for yourself. Listen to the original recordings as well as AI-translated versions and judge how ready the technology is today for professional multi-language podcast production. If you’re interested in applied AI, this podcast is for you.




