The Rise of Visual Conversational AI
如果无法正常显示,请先停止浏览器的去广告插件。
        
                1. The Rise of
Visual Conversational AI
POWERED BY
OCTOBER 2025            
                        
                2.             
                        
                3. From University spin out to global leader            
                        
                4. ABOUT SPEECH GRAPHICS | RAPPORT
The leading platform for
AI-powered character
animation.
• The #1 audio-driven animation platform for the AAA
video games. Licensed to 9 out of 10 of top global game
publishers.
• The all-in-one AI animation platform trusted by top
global brand characters
• Proved that our core AI technology delivers on its
promise of speech analysis, emotive accurate expressions
& lip sync that resonates with users.            
                        
                5. From Keys to AI Avatars
Keyboard
• First PCs
• DOS
Mouse
• Apple Mac
• MS Windows
Touch
• iPhone
• Android
Conversation
• ChatGTP
• Siri
Avatar
• Rapport            
                        
                6. Why Visual Conversational AI matters
now
6h58m 61 %
is the average screen time of adults worldwidee of professionals chat with AI for learning & work.
Our Media is now Visual. Conversation is the new UI as we interact
Douyin, TikTok, Instagram with devices through language.
ChatGPT, Siri            
                        
                7. Challenges
Building an engaging experience is hard
Latency
•
•
•
•
250ms feels human
Every step adds latency
TTS, ASR, LLM, Rendering
Engagement drops with latency
Avatar Choices
Costs
• Avatar choice impacts experience • Techstack relies on GPUs
• Constant streaming of TTS and Video
• Uncanny vs comfortable
• Always on Microphone
• Cold vs Warm
• Photoreal vs Stylised            
                        
                8. Building integrated AI and avatar tech is extremely complex.
Rapport makes it simple and scalable.            
                        
                9. Demo            
                        
                10. 3 ELEMENTS OF VISUAL CONVERSATIONAL AI
Speech and Language
Real-time Behaviour Modelling
Lip Sync and Expression            
                        
                11. Speech and Language
• Speech Recognition: Transforms spoken words into
accurate digital text.
• Natural Language Understanding: Captures meaning,
intent, and context.
• Natural Language Generation: Creates fluent, human-
like responses.
• Voice Synthesis: Produces realistic, expressive voices for
interaction.
The rapid progress in speech and language technologies,
powered by the LLM revolution, is transforming how we
communicate with machines. Interactions are becoming more
natural, conversational, and human-centered than ever before.            
                        
                12. Lip Sync and Expression
• Accurate Lip Sync: Matches voice to precise mouth
movements for natural communication.
• Audio-Driven Analysis: Interprets speech tone, pace, and
emotion to drive realistic animation.
• Emotional Expression: Conveys subtle feelings through
facial cues and micro expressions.
• Body Movement and Gestures: Integrates natural posture
and motion to enhance experience.
Interactive AI Avatar use audio-driven animation to
synchronize voice, emotion, and movement, creating lifelike,
responsive digital interactions that feel genuinely human.            
                        
                13. Real-Time Behaviour Modelling
• Active Listening: Processes user speech in real time to
detect intent, tone, and emotion.
• Contextual Understanding: Integrates AI output to
interpret conversation flow and context.
• Dynamic Response Generation: Coordinates verbal and
non-verbal reactions instantly.
• Adaptive Behaviour: Adjusts facial expressions, gaze,
and gestures based on live interaction cues.
Real-time behaviour modelling enables avatars to speak,
listen, and react naturally, creating fluid, human-like
conversational experiences driven by both user input and AI
intelligence.            
                        
                14. Trust your Avatar
• Embodiment and Presence: Lifelike avatars create a sense
of real connection.
• Proven by Research: Human-like interaction significantly
increases trust and engagement.
• Better Outcomes: Greater trust leads to improved
communication, learning, and decision-making.
Embodied avatars, those that exhibit human-like appearance
and behavior, are more likely to elicit trust from users. For
instance, a study in the hospitality industry found that both
cognitive and emotional trust in avatars significantly influenced
customer experience, brand loyalty, and overall satisfaction
*ResearchGate            
                        
                15. USE CASE HIGHLIGHT
Entertainment: Location based and Kiosks
Our clients are creating interactive
location-based experiences such as
Madame Tussauds recreating well known
historical figures that visitors chat with.            
                        
                16. USE CASE HIGHLIGHT
Marketing & CX: Customer experience AI agent avatars
Rapport is partnering
with DruidAI, leading AI
agent company, to
integrate and deploy
avatars across their
customer experience
enterprise portfolio            
                        
                17. USE CASE HIGHLIGHT
HR and Corporate Learning: Management training
For HR and L&D leaders, Rapport delivers
personalized, real-time soft skills
training with measurable and trackable
insight reports            
                        
                18.             
                        
                19. The Future Is
Interactive
THANK YOU            
                        
                20.             
                        
                21. Rapport is the leader in visual
conversational AI.
Gregor Hofer, CEO
gregor@rapport.cloud