Detail

Conference Designing Spaces Sound Design

ON SITE Meidinger-Saal Thursday, May 07, 12:30

From Pixels to Waveforms: How AI Bridges the Gap Between Picture and Sound

Video on demand

In animation and VFX production, sound design often begins where the visual pipeline ends – but what if the pipeline itself could kickstart the audio process? Christoph Groß-Fengels built a set of AI-powered tools that do exactly that: they watch your edit, understand what’s on screen, and suggest sounds before a sound designer even opens a session.

In this 45-minute talk, Christoph demonstrates a custom Python-based workflow designed to slot into existing post-production pipelines. The system automatically detects every cut in a video, extracts key frames, and sends them to a vision AI (Google Vision API) for content analysis – identifying objects, environments, actions, and mood. Based on these visual tags, it searches a local sound effects library for matching audio and assembles a first-pass sound bed. On top of that, it generates a synchronized MIDI file aligned to every edit point, giving composers a rhythmic and structural blueprint that mirrors the visual rhythm of the piece.

For teams working on animation or VFX-heavy projects, this means the sound department gets a head start while picture is still being finalized. Instead of waiting for picture lock, the pipeline can generate preliminary sound passes that evolve alongside the visual edit – a workflow that mirrors how previs and techvis already function on the image side.

Christoph also covers supporting automation tools he developed: local speech recognition (Whisper) for metadata tagging of audio assets, API-driven loudness normalization to broadcast standards (EBU R128), and automated audio-video muxing – all eliminating repetitive manual steps familiar to anyone running a production pipeline.

Through live demos and real examples, the talk honestly examines where AI delivers genuine time savings, where it produces surprisingly useful creative suggestions – and where human ears, dramaturgical instinct, and artistic judgment remain irreplaceable. A talk for pipeline TDs, production supervisors, sound designers, and anyone curious about where AI fits into the audio side of their production workflow.

Christoph Groß-Fengels, Founder, 42 Sounds

Christoph Groß-Fengels is the founder of 42 Sounds, a Hamburg-based studio specializing in sound branding, audio strategy, and AI-driven audio solutions. With over 20 years of experience in the audio industry, he has created music and sonic identities for brands including BMW, McDonald’s, Siemens, Bosch, TUI, DATEV, EnBW, Kölnmesse, Olympus, and bonprix.

Christoph combines deep expertise in sound design and brand strategy with hands-on software development. He builds custom AI-powered tools and automated workflows that bridge the gap between visual production pipelines and audio post-production – from computer-vision-based cut detection and intelligent sound matching to automated metadata management and loudness compliance.

A regular speaker at industry events, Christoph presented his AI-powered sound design workflows at the Tonmeistertagung 2025 in Düsseldorf to an audience of professional audio engineers and producers. His work has been recognized with awards including the Red Dot Design Award and multiple Better Sound Awards.