Want to create interactive content? It’s easy in Genially!

Get started free

Avatar Technologies

Tim Piatenko

Created on October 29, 2025

Start designing with a free template

Discover more than 1500 professional designs like these:

Modern Presentation

Terrazzo Presentation

Colorful Presentation

Modular Structure Presentation

Chromatic Presentation

City Presentation

News Presentation

Transcript

Avatar Technologies

Generative AI Research and POC

Overview

With emergence of generative AI technologies for video, we wanted to research the current landscape, understand the various capabilities and limitations, costs, applicability to current roadmap, the level of effort required to integrate into products, and potential future directions. This is a summary of our findings with examples of inputs and outputs.

( Reference for criteria to use when evaluating various products: )

Original POC Structure

Original Video

We start with the current BombBomb business, and assume we have an existing video from a client

Direct AI feedback

We want to see if there's a way to get direct feedback with suggested improvements from AI

Full AI overhaul

Alternatively, can we submit the video to an AI platform and have it re-generate it based on best practices?

Avatar generated

And finally, can we apply digital replica / Avatar technologies to redo the parts of the video we want altered, and stitch the results back into the original seemlessly?

Roadmap features to consider

Can we use gen-AI technologies and tools to personalize parts of an original video at scale, changing names, titles, or locations dynamically for each client?

Personalization
Fallback videos

Can we use Avatar tech to record "fallback" videos that would play when none of the personalization options work?

Remove filler words

Can AI help us remove unwanted content in a video by removing filler words and eliminating awkward pauses?

Persistent eye contact

Can we fix inconsistent eye contact in a pre-recorded video using AI?

Bifurcation of video AI tech

Avatar = Create your digital replica, tell it what to say and how to say it
Generative = You tell us what you want, we try to make it happen

Bifurcation of video AI tech

  • Prompt-based
  • Flexible
  • Halicination-prone
  • Designed to work on the whole video
  • Structured
  • Separates Text and Video
  • Input-dependent
  • Modular

Examples

Walkthrough

Let's start with a video provided by our very own Kevin Andrews and see what we can do

Walkthrough

We can use simple tools like CV2 and MoviePy in Python + Assembly AI + Creatomate to extract frames, extract the audio and transcribe it, trim the clip to get the parts we want and recombine edits with transition effects

Hey, John, it is Kevin here. Hey. I wanted to reach out to you because I know you are a manager of a sales team, and if you're anything like the other managers I've been talking with, you guys are having a hard time right now. You've invested a lot with the technologies, with the team members that you have, but for some reason it's just not working and it's hard to figure out why. Well, I would love to spend a little bit of time and show you why video is helping people not replace their other technologies, but enhance so that the things you've already invested in, the team you've already invested in, are working the best way possible. Schedule some time with me. Would love to talk more.

Walkthrough

Now we can do a number of things with these:• Audio can be used to clone Kevin's voice using e.g. ElevenLabs • Text can be imported into e.g. Python and customized programmatically • An individual frame is enough to generate a simple "talking head" consent video

Walkthrough

We can now generate a replica / avatar for Kevin in e.g. Tavus, and feed the new updated script to it

Walkthrough

But ideally, we would only need the part of the clip, where the names and titles are. Customize that and stitch is back in, using e.g. Creatomate

Issues

  • AI-generated clips are usually at least 2 sec long
    • Requires some prompt engineering
  • Need to match FPS and resolution
    • Not hard, just extra processing
  • Need to match transition to original
    • Can be tricky...

Solution 1

  • Use pause identification routines to find better places to splice the original
  • Use First and Last frame AI technology to generate the filler with natural motion
  • Tweak to match better
    • Pick better F&L frames
    • Upscale still frame images
    • Speed up the filler segment
  • Use a combination of techniques and APIs
    • MoviePy for manipulating the clips
      • extract audio, cut, recombine
    • AssemblyAI for transcription
    • Cutout or similar for image upscaling etc
    • ffmpeg for pause identification
    • Vidu for F&L frame → Video filler

Solution 2

  • Extract audio from the original (MoviePy)
  • Transcribe the audio (AssemblyAI)
  • Personalize the script where applicable
    • Can use an LLM to identify words or phrases (e.g. names, places, and titles)
  • Use a voice cloning service (e.g. Vidu) to regenerate the audio
  • Use lip-syncing tool (e.g. Vidu) to match the new audio to the orignal video

Example of using an LLM to analyze the original script for personalization potential →

Solutions pros and cons

Slicing and dicing
  • Modular approach
    • minimize reprocessing
    • remove filler words
    • remove awkward pauses
  • More work to identify good cut points
  • More work on recombination
Redo the whole thing
  • No need for any matching
  • AI for lip-sync is cheaper than full regeneration
  • Fewer steps in general
  • Can't remove anything, only substitute with similar length words

Reimagine?

And finally, we can feed the resulting video to a video generation / reimagining / modification AI platform e.g. Runway

Full Avatar generated content

Video → Frame → Consent → Avatar / Replica → Video

Personalization

Video → Audio → Text → Customized → Video

So far:

Fallback content

Just add logic to your Python code

AI reimagined

Need to be very careful with prompt...

...

But what about AI feedback?

Getting feedback on existing content is quite difficult... However, Getting live coaching and analysis using an Avatar-driven Agent or a "Persona" is possible!

Resources

Code repository

POC Overview

Technical artifacts

POC Outcomes