Gradio

🎶 Overview

This demo showcases a simple single-turn audio–language interface built for integration with the WeaveMuse framework.
It uses the open-source NVIDIA Audio Flamingo 3 model for audio understanding, transcription, and sound reasoning.

You can upload an audio file and ask natural-language questions such as:

“What kind of sound is this?”
“Describe the scene.”
“Transcribe any speech.”

Acknowledgment:
Model and research credit to NVIDIA, for the development of the open Audio Flamingo 3 model and datasets used for training.
This interface is a simplified demonstration of how such models can be integrated into broader creative AI systems like WeaveMuse.

Tech stack: Gradio + PyTorch + llava + WeaveMuse Integration

Audio Understanding Demo (WeaveMuse Edition)

🎶 Overview