Remix songs, explore concepts, and create musical mayhem with continuous and instantaneous control of real-time audio generation.
Moving past static, batch-mode "prompt-to-song" boxes. DEMON (Diffusion Engine for Musical Orchestrated Noise) is our open engine built for continuous frame-level interaction.
DEMON uses a ring-buffer framework to process continuous audio streams, bypassing the multi-second control latency of autoregressive systems.
DEMON exposes six frame-level scalars including velocity scale, guidance, and SDE blend. Updates propagate in a single tick (~81ms)
DEMON delivers 12.3 generations per second of 60-second music natively on a single consumer GPU. No corporate TPU or cloud clusters required.
Timbre and denoise driven live, with the conditioning blended between two text prompts: acoustic deep house and a daft-punk four-to-the-floor.
The XL-turbo (5 B) checkpoint with timbre, structure, and denoise manipulated live as the song plays.
Alternative-rock and funk LoRAs refit live into the running TensorRT decoder, with no engine rebuild.
An LLM agent driving the engine through DEMON's MCP server: it reads the live control values, then writes denoise, structure, timbre, and prompt-blend updates on a two-bar cadence to evolve the remix in real time.
Daydream is using DEMON to power upcoming audio tools for musicians and audio artists. Remix tracks and samples in real-time, build interaction-responsive sets, and improvise alongside a generative engine that responds to your performance on the fly.
Map the model's latent space directly to physical hardware faders, knobs, and macro lanes. Because control propagates at 25Hz, the audio stream reshapes the exact millisecond your hand moves a controller.
Load a track and bend it on the fly. DEMON streams variations continuously with sub-100ms latency, allowing you to re-style and morph arrangements live without breaking your performance flow.
Audition entirely new genre trajectories, continuous key shifts, and unexpected arrangement variations on your timeline without waiting for a render.
Manipulate tracks without destroying the core hook. Isolate and lock vocal elements or primary melodies while continuously diffusing and mutating the underlying instrumental stems beneath them.
Join our Discord to get early previews of new tools, share your work, and join live jam sessions. Whether you are debugging code or producing tracks, this is the place to get feedback, technical support, and find collaborators.
Suno and Udio are batch-mode cloud services — you type a prompt, wait for a finished song to render, and start over if you want to change anything. They are closed-weight APIs with no per-frame control. DEMON takes the opposite approach: it is diffusion-based, runs on open weights (ACE-Step v1.5), and operates on the entire 60-second song latent simultaneously, so it can be steered per-frame at 25Hz while it is generating. Practically, a musician can twist a knob and hear the change in about 81 milliseconds. Google's Lyria gets closer to real time but still chunks at roughly 2 seconds and runs only on Google's TPU cloud. DEMON fills the only-open + real-time + locally-runnable + per-frame-controllable corner of the matrix.
Yes. The DEMON engine lives at github.com/daydreamlive/DEMON and wraps ACE-Step v1.5 — the open-weights music foundation model from ACE Studio. The split matters: ACE Studio writes the model architecture, the training data, the VAE, the text encoder, and the turbo distillation. Daydream writes the streaming engine that turns that batch model into a 12.3-generations-per-second live instrument. Both pieces are open and free to run locally.
Yes — DEMON is local-first. It delivers 12.3 generations per second of 60-second songs on a single consumer GPU. The reference numbers are on an RTX 5090, but the engine supports cards from 8GB VRAM upward. The decoder runs under TensorRT with refit enabled, which is what lets LoRAs hot-swap into the live engine without a rebuild. The browser demo at music.daydream.live lets you try it without installing anything; the GitHub repo is for running it on your own GPU.
LoRA stands for Low-Rank Adaptation. It is a small file — typically tens of megabytes — that nudges the diffusion model's weights toward a specific style without retraining the whole foundation model. The same idea exists in image generation. DEMON's TensorRT decoder is refit-enabled, so LoRAs hot-swap into the running engine without rebuilding it. Daydream maintains the Synthpop LoRA on Hugging Face (huggingface.co/daydreamlive/synthpop), and community LoRAs like Deathstep are already shipping. Because the swap happens during diffusion rather than at submission time, you can keep one source rhythm anchored while morphing the timbre toward a completely different style live.
About 81 milliseconds for step-time parameter changes — the time between your hand moving a controller and the audio reflecting it. DEMON exposes a set of step-time scalars (per-frame source preservation, velocity scaling, ODE noise injection, classifier-free guidance rescale, channel gain, APG momentum, DCW scalers) that live in a shared mutable registry every in-flight song slot reads on every forward pass. Writing to that registry takes effect on the next tick for every slot at once, regardless of pipeline depth. Sub-100ms is the threshold where something stops feeling like a render and starts feeling like an instrument. For comparison, Suno and Udio need a full re-render (10+ seconds) for any change, and Lyria chunks at roughly 2 seconds.
ACE-Step is the music foundation model — a 2-billion-parameter Diffusion Transformer (with a 5B XL variant), in the same architectural family as Stable Diffusion 3 and FLUX but for audio. It compresses 48kHz stereo into a 64-channel latent at 25Hz, and in batch form generates a 60-second song in roughly 2 seconds on an A100. DEMON is the streaming engine wrapped around it. ACE Studio writes the model; Daydream writes the engine. The engine adds a ring-buffer scheduler that runs multiple in-flight generations on overlapping timestep schedules, a per-tick batched decoder pass that emits a finished song latent every tick after warmup, a shared mutable curve registry for live parameter changes, native TensorRT engines for 60 / 120 / 240-second song lengths plus a sliding window for longer songs, and refit-enabled LoRA hot-swapping.
The VST/AU plugin is in development. Right now there are two ways to try DEMON: the web demo at music.daydream.live (zero install — load a sample, twist parameters, hear it remix in your browser), and running the engine locally from the GitHub repo for full control. The plugin will let you map DEMON's step-time scalars to your DAW's standard envelope lanes, macro knobs, and MIDI hardware, and integrate streaming generation directly into your project timeline. Join the early-access list at tally.so/r/q4jxo9.