Products Downloads Learn Pricing

Company

About
Contact

Legal

Terms of Service
Privacy Policy

Copyright © 2026 Daydream

Lessons & guides

Learn Daydream

Short guides and lessons on Daydream, the engine underneath, and getting it running in your DAW.

Try demo VST early access

Contents

01Getting Started4 guides↓

02Learn the VST↓

↳Controls19 controls↓

↳DAW GuidesComing soon

↳TroubleshootingCommon fixes↓

03Use cases4 guides↓

04Max for LiveSoon↓

05Under the hood2 lessons↓

06Glossary10 terms↓

6 sections · 17 guides · 10-term glossary

Section 01 · 4 guides

Getting Started

01

Download Daydream Effect, install it in your DAW, and connect your account. Up and running in a couple of minutes.

Guide · 2 min read

What Daydream does

Daydream is the first AI-native instrument. Leveraging cutting-edge audio synthesis research, it delivers near-real-time controllability of audio generation. While it's registered as an Effect in most DAWs, it behaves more like an instrument: using any input, it continuously generates music based on your input.

Guide · 2 min read

Install Daydream Effect

Download the plugin and get it showing up in your DAW. A couple of minutes, start to finish.

Guide · 1 min read

Get your Daydream API key

Your key links the plugin to your account. Grab it from your Daydream sign-in, once.

Guide · 1 min read

Enter your key and start a stream

Paste your key into the plugin, connect, and start remixing audio in real time.

Section 02

Learn the VST

02

Daydream inside your DAW as a VST/AU plugin. The plugin is in alpha — start with what it does, then open the DAW Guides for host-specific steps and Controls for what every knob, slider, and switch does.

The live controls, drawn the way they look in the app. Open any card for what the control does and its range.

Core

0 to 1.0, step 0.05

Web app default 0.7

Also calleddenoise in the engine, “Remix strength” in the MIDI menu, !strength on the radio.

How hard the model reshapes your source audio.

Strength is the most expressive control on the panel. The engine calls it denoise under the hood, but the knob just says Strength. Keep it low and you get a subtle remix that stays close to your original. Push it up and the model takes over, all the way to a full transformation.

Why it matters

It's the first knob to reach for. Sweeping it live while a track plays is the fastest way to feel what Daydream actually does.

Think of it like

Think of it like the wet/dry mix on an effect, except the wet signal is the model reimagining your track from scratch.

See also

0 to 1.0

Defaults to 1.0

Also calledhint_strength in the engine, !structure on the radio.

How closely it follows the song's arrangement.

Structure decides how tightly the output tracks your original's sections, rhythm, and dynamics (the engine calls it hint_strength). Turn it up to keep the arrangement intact. Bring it down and the model is free to rearrange, and near zero it stops following your track and starts writing its own.

Why it matters

This is how you choose whether a remix stays on the rails of the original song or wanders off to dream up something new.

Think of it like

Like a follow-the-chart dial: full means stick to the original, low means take liberties.

See also

0 to 1.0, step 0.05

Also calledtimbre_strength in the engine.

How much of the original's instrument character carries through.

Timbre sets how much of your source's tone and color survives into the output (the engine calls it timbre_strength). High keeps the original instruments recognizable. Low frees the model to swap them for whatever fits the prompt.

Why it matters

It separates what is being played from what it sounds like, so you can keep the arrangement while the model recasts the actual instruments.

Think of it like

Like reamping a part: same performance, but you decide how much of the original tone bleeds through.

See also

Also calledTags A / Tags B in the app, Send Tags commits it.

The text that tells the model what to generate.

In Daydream the prompt field is labelled Tags: a short description of genre, mood, instruments, and tempo. You can run two sets, Tags A and Tags B, and crossfade between them live. Editing the text doesn't send it on its own; you hit Send Tags to commit it.

Why it matters

Specificity wins. “Deep house, muted bass, warm rhodes” lands far closer than “electronic beat.” Re-roll the Seed for a fresh take on the same tags, or lock the Seed to reproduce one exactly.

Think of it like

Like calling out a vibe to a session band. The clearer the brief, the closer the first take.

See also

Also called“style” in casual use.

A small add-on file that teaches the model a style.

A LoRA (Low-Rank Adaptation) is a small add-on file, far smaller than the model itself, that nudges it toward a particular genre or sound without retraining the whole thing. In the app they live in the LoRA Library; you can enable up to four at once, each with its own strength fader, and they stack. A prompt is your instruction for this take. A LoRA is a baked-in aesthetic the model carries across every prompt.

Why it matters

Pick the style with a LoRA, then steer the specifics with tags. 16 genre LoRAs ship out of the box, and they hot-swap into the running engine in about 1.2 seconds, so you can audition styles while the music plays.

Think of it like

A session player who can do anything, taking a quick lesson in one genre: cheap to teach, fast to swap, and it colors everything they play.

See also

Modulation & engine

0 to 1.0

0.3–0.5 sweet spot

How much each new generation echoes the last.

Feedback sets how similar each new generation is to the previous one. Low gives you fresh variety on every refresh; higher gives a continuous evolution where each generation flows into the next.

Why it matters

It's the difference between constant reinvention and a smooth, evolving morph. 0.3–0.5 is the sweet spot for continuity without everything sounding the same.

See also

1 and up

Defaults to 1

How far back in time the feedback reaches.

Feedback depth sets how far back the Feedback knob looks. At 1 (default) it blends with the most recent generation; higher values reach back several ticks for an echo or ghost effect, where a faint repeat of an earlier moment surfaces in the current output.

Why it matters

It lets you get distant, ghostly feedback without cranking Feedback all the way up.

See also

Where the model concentrates its work across denoising.

An advanced control that changes where the model focuses effort across the denoising steps. The default is tuned for the turbo engine and works well in most cases.

Why it matters

Leave it alone unless you're chasing a specific feel — it's a fine-tuning knob, not an everyday one.

See also

tunable

Default 8 (turbo balance)

How many diffusion steps each generation runs.

The diffusion step count. Fewer steps means lower quality; more steps means more latency. Changing it rebuilds the streaming pipeline, so expect a brief audio glitch when you move it.

Why it matters

It's the direct quality-versus-latency trade. Most of the time the default is right; raise it only if you can spare the latency.

See also

1 to 8

Concurrent denoising slots in the streaming ring buffer.

How many generations the StreamDiffusion ring buffer keeps in flight at once. Low depth means faster parameter-update latency (best for snappy, discrete changes); high depth means higher throughput, smoother glides, and better GPU use. It's capped to the engine's max batch size.

Why it matters

It tunes the engine to your playing style: shallow for stabby, reactive moves; deep for liquid, continuous sweeps.

See also

3–8 useful

Needs RCFG on

How hard the output is pushed toward the prompt.

Classifier-free guidance (CFG) strength. It only takes effect when the RCFG mode is not off. Higher values push the output further toward the prompt at the cost of more artifacts. The turbo model is CFG-distilled, so the useful range is narrower than a base model.

Why it matters

It's your prompt-adherence dial — but turbo likes a light touch, around 3 to 8.

See also

0 to 1.0

Tames the harshness that high guidance can add.

After CFG is applied, this mixes the guided signal's loudness back toward what the un-pushed pass produced. 0 keeps raw CFG; 1 fully snaps the magnitude back. Pair it with high guidance to keep the prompt-push without the harshness high CFG causes on its own.

Why it matters

It lets you chase strong prompt adherence without the output turning brittle or clipped.

See also

Whether guidance is on, and in what mode.

Off means no guidance — the turbo default. The other modes re-introduce classifier-free guidance at near-zero cost over the baseline, which is what brings the Guidance scale and CFG rescale knobs to life.

Why it matters

It's the master switch for prompt-guidance. Off is fastest; turn it on when you want the prompt to bite harder.

See also

experimental

Experimental band scalers for the model's self-correction.

DCW is an internal correction the model applies to itself during generation. The low and high knobs adjust its strength in each band — low acts in the early part of the run, high in the later part. The exact audio mapping is still being explored.

Why it matters

Pure sound-design territory: sweep it to discover what it does to your source. Extreme values can be unpredictable, but interesting.

See also

Experimental tone

0 = off · 5–15 by ear

Tilts the sound brighter, with more highs.

An activation-steering knob: it nudges the model's internal representation toward a brighter spectrum (a higher spectral centroid), independent of the prompt. 0 is off.

Why it matters

A direct tone-shaping move that works whatever you've prompted — reach for it when a mix needs air. Useful range is roughly 5 to 15 by ear.

See also

0 = off · 5–15 by ear

Tilts the sound warmer, toward the bass.

An activation-steering knob that shifts the spectrum toward the low end for a warmer feel. The counterpart to Bright. 0 is off.

Why it matters

Pulls the tone down into the chest without touching the prompt. Useful range is roughly 5 to 15 by ear.

See also

0 = off · 5–15 by ear

Adds grit and noise to the texture.

An activation-steering knob that increases spectral flatness — grittier, noisier output. The effect builds slowly as you push it. 0 is off.

Why it matters

Dirties a clean generation up, useful when something sounds too polished. Useful range is roughly 5 to 15 by ear.

See also

0 = off · 5–15 by ear

Thins the texture toward sparse and minimal.

An activation-steering knob that thins the sound toward a sparser, more minimal texture. 0 is off.

Why it matters

Opens space in a busy generation, pulling it toward minimal arrangements. Useful range is roughly 5 to 15 by ear.

See also

per channel, default 1.0

⚠

Experimental feature. These are not traditional audio channels and gains — they manipulate different dimensions of the model's latent space, and produce results ranging from nuanced and beautiful to abrupt and discordant. Use at your own risk.

Steer individual dimensions of the model's latent space.

Not traditional audio channels or gains. Channel highlights nudge individual latent channels (ch13, ch14, and friends); channel groups (ch g0–g7) move whole bands at once. Each defaults to 1.0 — turn one and you push the model along a dimension that has no neat audio name.

Why it matters

It's the deepest steering Daydream exposes. Reach for it when the prompt and tone knobs can't get you somewhere, and treat the result as discovery rather than control.

See also

Section 03 · 4 guides

Use cases

03

Four ways people put Daydream to work — building sample libraries, processing parts in a production, designing sound, and using it as a creative partner.

Guide · 2 min read

Sample creation

Build coherent sample packs by performing Daydream live and pulling the moments that work.

Guide · 2 min read

Production

Use Daydream as an instrument in your session — play a part into existence rather than arranging it.

Guide · 2 min read

Sound design

Turn arbitrary input — field recordings, found sounds, noise — into designed sound, performed in real time.

Guide · 2 min read

Creative partnership

Play Daydream as an instrument — set conditions, listen to what comes back, and respond.

Section 04 · Coming soon

Max for Live

04

Daydream as a Max for Live device, native to Ableton's session view and clip workflow.

Coming soon

Section 05 · 2 lessons

Under the hood

05

How the engine works, for the curious. You don't need any of this to play, but it helps to know what the knobs are talking to.

01 · 1:32 · 9:16

StreamDiffusion, but for audio

What StreamDiffusion did for images, the engine does for sound: batch denoising and a ring buffer turning rapid denoising into real-time audio.

02 · 0:52 · 9:16

The windowed VAE, explained

How the engine decodes audio in overlapping windows so generation stays fast without the sound falling apart at the seams.

Section 06 · 10 terms

Glossary

06

Plain-language definitions for the engine, the tech, and the gear behind it all — the words the lessons throw at you, in one place.

The engine under the hood

The real-time engine that makes the music playable.

The engine is the runtime and control layer behind the instrument — “StreamDiffusion, for audio.” It takes a model that would normally render a song in one batch and makes that generation streamable and steerable as it plays. It's open source, and it can run on your own GPU.

Why it matters

This is why Daydream feels like an instrument instead of a render queue. You move a knob and hear the change, instead of submitting a prompt and waiting.

Think of it like

If the model is the band, the engine is the live mixing desk that lets you ride the faders mid-performance.

See also

The open music model the engine actually runs.

ACE-Step v1.5 is the open-source model that writes the music, and the Daydream engine wraps around it. The split is worth knowing: ACE-Step composes, the engine makes it playable in real time. The default checkpoint is a 2B-parameter turbo model (a larger XL version also exists), released by the ACE-Step team under an MIT license.

Why it matters

The split tells you what's swappable. The engine stays put while the model underneath can change.

Think of it like

ACE-Step is the songwriter. The engine is the touring rig that lets you perform what they wrote, reshaped, every night.

See also

The tech, in plain terms

Generating sound by clearing away noise, step by step.

A diffusion model starts from pure noise and removes it in small steps until coherent audio emerges. It learned the trick by watching real audio get buried in static and practicing the reverse. Music models do this in a compressed space rather than on raw waveforms, and the turbo model finishes in just 8 steps.

Why it matters

It's why the Strength knob feels the way it does. You're telling the model how far back into the noise to start, which is how much room it has to reinvent before it settles.

Think of it like

Like a Polaroid developing, sharpening into focus a little more with each pass.

See also

The real-time-image trick the engine borrows for audio.

StreamDiffusion made image diffusion run in real time by keeping several generations in flight at once, each frozen at a different stage of denoising on one assembly line, all advanced by a single pass. The Daydream engine is the audio version of that idea. Its “ring depth” is how many generations it keeps in flight, from 1 to 8.

Why it matters

It's the reason you get continuous, gap-free audio you can steer live. The engine is always working ahead, so the next sound is ready the moment you need it.

Think of it like

A kitchen line where every station works a different dish at once, so plates come out steadily instead of one at a time.

See also

The codec that shrinks audio so the model can work fast.

A VAE compresses audio into a small code the model paints in, then decodes it back to a waveform. To stream without gaps, the engine decodes in overlapping one-second windows and keeps only the middle slice, trimming the edges that exist just to avoid seams. Decoding only the window you need, instead of the whole song, is what keeps the latency low.

Why it matters

It's the unglamorous part that makes “no clicks, no gaps, no waiting” actually true.

Think of it like

Like crossfading loops in a DAW so the splice is inaudible. The decoder overlaps its chunks for the same reason.

See also

How fast your knob move reaches the audio.

Latency is the gap between your hand and the sound. The engine's per-frame knobs land in roughly 14ms at shallow ring depth, rising to about 81ms at the deepest, with 25 control points every second. Sending a whole new prompt takes longer (around 112 to 649ms, depending on depth) because the model has to converge on the new idea.

Why it matters

Latency this low, well under a tenth of a second, is what makes a tool stop feeling like a render and start feeling like an instrument under your hands. It's also why a knob sweep feels instant while a fresh prompt takes a beat to land.

Think of it like

Like the difference between bending a string, which is instant, and calling a key change to the band, who need a bar to land it.

See also

Your gear and setup

Also calledMIDI learn the bind gesture, MIDI in the input jack.

Bind a physical knob to any on-screen control.

MIDI is a control protocol. It carries notes, timing, and parameter messages, not audio, and it lets a hardware controller drive software. In Daydream there's no separate setup screen: right-click most knobs, sliders, or buttons, wiggle the physical control you want, and it binds on the spot. That's “MIDI learn.”

Why it matters

It turns Daydream from a mouse-driven app into something you play with your hands. Map Strength to a fader and you're performing the model, not clicking it.

Think of it like

Exactly like MIDI-learning a plugin parameter in your DAW: same gesture, same muscle memory.

See also

Also calledFull / Instr / Vocals the source switch.

Isolated layers, like vocals and instruments, of your source.

A stem is one grouped layer of a song bounced to its own track: all the vocals, or the whole instrumental bed. Daydream pulls vocal and instrumental stems out of your source, so you can mix each layer into the output on its own, or feed just the vocals or just the instruments into generation.

Why it matters

It lets you keep one part recognizable while the model transforms the rest. Lock the vocal and let it rebuild the backing, or the other way around.

Think of it like

Like soloing the instrumental bus versus the vocal bus on a mixing desk.

See also

The plugin format versus the studio it plugs into.

A DAW (Digital Audio Workstation) is the software where you record, sequence, mix, and arrange, like Ableton, FL Studio, or Logic. A VST is a plugin format: add-on software that runs inside a DAW. AU is Apple's Mac and iOS equivalent. A standalone app is the same tool packaged to run on its own. Daydream's web app runs in your browser today, and a VST plugin is on the way.

Why it matters

It tells you how Daydream fits your workflow. The browser app needs nothing installed, while the coming VST drops the engine straight into your DAW session alongside your other plugins.

Think of it like

The DAW is your studio. A VST is a piece of gear that racks into it. A standalone is that same gear on its own stand.

See also

The graphics-card memory you need to self-host.

VRAM is the dedicated memory on your graphics card, and it's the main thing that decides whether a model runs locally, because the model has to fit. You only need it if you want to self-host the open-source engine; the hosted web app needs no GPU at all. The realtime engine's practical floor is around 16GB of NVIDIA VRAM, with a 24GB card like the RTX 4090 comfortable. The benchmarks were run on an RTX 5090 with 32GB.

Why it matters

It draws the line between just opening the browser and running it yourself, and if you do run it yourself, more headroom means lower latency.

Think of it like

Like track count and plugin headroom on an old machine: run out and everything chokes, have plenty and it flies.

See also

Getting stuck?
Ask a fellow musician.

Artists congregate in the Daydream Discord. If you've got a question, chances are someone will know the answer.

Join the Discord