Candy Dungeon Music Forge

User Guide · v0.1 · Early Access

Candy Dungeon Music Forge (CDMF) is a local AI music workstation powered by ACE-Step and a custom UI designed to make generating, tweaking, and curating your music a smooth and cohesive experience. This guide explains how to install CDMF, generate tracks, manage your library, and train LoRAs.

Local-first · Windows ACE-Step text → music LoRA training Stem separation Dataset tools

Overview
System requirements
Installation & first launch
UI tour
Generating music
Vocal / instrumental stem control
Training LoRAs
Dataset mass-tagging tools
MuFun-ACEStep analyzer (experimental)
Troubleshooting & FAQ

1. Overview

Candy Dungeon Music Forge (CDMF) is a local AI music workstation for people who actually like owning their tools. It runs on your Windows PC, uses your GPU, and keeps all audio and prompts on your hardware.

What CDMF is built on

ACE-Step – the diffusion engine that turns prompts + lyrics into audio.
PyTorch – the deep learning runtime used by ACE-Step and related models.
Qwen-like LLM backend (via your configured model) – used for the “Generate prompt / lyrics…” helper.
audio-separator – used for post-process stem separation (vocals vs. instruments).
MuFun-ACEStep (optional) – an analyzer that can auto-create prompt/lyrics files for datasets.

What makes CDMF more than “just a wrapper”

Sleek generation UI with Core and Advanced sections, presets, and clear tooltips so you don’t have to memorize every ACE-Step knob.
Built-in music player + library view – browse, sort, favorite, and categorize every generated track.
Preset system – save, load, and share your favorite generation settings.
LoRA training UI – configure and kick off ACE-Step LoRA training runs without hand-editing Python scripts.
Dataset helpers – bulk create prompt/lyrics files or auto-generate them with MuFun-ACEStep.

A typical workflow looks like:

Launch CDMF → wait for first-time setup (venv + packages + ACE-Step models).
Use Generate Track to create songs from prompts (optionally with lyrics).
Browse, favorite, and categorize tracks in the Music Player.
(Optional) Use stem controls to tweak vocal vs. instrumental levels.
(Optional) Build datasets and use the Training tab to train custom LoRAs.

2. System requirements

Minimum

Windows 10 or 11 (64-bit)
NVIDIA GPU (RTX strongly recommended)
~10–12 GB VRAM (more gives more headroom)
SSD with tens of GB free (models + audio + datasets)

Comfortable experience

Modern RTX card with plenty of VRAM (e.g. 12–24 GB)
32 GB system RAM
Fast NVMe SSD for models and datasets
Comfortable with “power user” tools and reading console logs

Note: The very first launch does a lot: creates a virtual environment, installs Python packages, and downloads ACE-Step and related models. This can take a while. All of that work is reused on later launches.

3. Installation & first launch

3.1 Installing CDMF

Download the CDMF archive (ZIP) from your store of choice (e.g. itch.io or Gumroad).
Extract the ZIP somewhere convenient (e.g. C:\CDMF_Installer).
Inside you’ll find the main installer (e.g. CandyDungeonMusicForge-Setup.exe) and any support files.
Double-click the installer and follow the prompts. By default, CDMF installs under:
%LOCALAPPDATA%\CandyDungeonMusicForge
When setup finishes, you’ll have:
- A Start Menu shortcut to Candy Dungeon Music Forge.
- An optional desktop shortcut (if you enabled it).

The installer does not bundle a full venv. Instead, it ships a slim embedded Python and a requirements_ace.txt file. CDMF will build a fresh venv_ace the first time you run it.

3.2 First launch: what you’ll see

Launch CDMF from the Start Menu or desktop shortcut.
A console window titled something like “Candy Dungeon Music Forge – Server Console” will appear. This window must stay open while CDMF runs.
CDMF immediately opens a loading page in your default browser while the backend is starting.
On first run, the console will:
- Create venv_ace under the app folder.
- Install packages from requirements_ace.txt.
- Install ACE-Step and the PyTorch CUDA stack.
- Set up other helpers like audio-separator.
When the server is ready, your browser will show the full CDMF UI (the template from cdmf_template.py).

Important: Don’t close the console while it’s working. If you see Python / pip errors, read the last messages carefully. Many issues (GPU drivers, missing VC runtimes, low disk space) will show up here.

3.3 Subsequent launches

On later launches, CDMF will:

Reuse the existing venv_ace.
Skip package installs if everything is already in place.
Skip large model downloads unless a feature needs a new one (e.g. MuFun).

4. UI tour

4.1 Title bar & tagline

At the top you’ll see the Candy Dungeon Music Forge titlebar:

Logo on the left.
App title and version (e.g. v0.1) on the right.
A short tagline: “Generate unlimited custom music with a simple prompt and style presets via ACE-Step.”

4.2 Music Player card

The first main card is Music Player. It’s your library view for generated tracks.

Folder: shows the current output directory (e.g. {{ default_out_dir }} when rendered in Flask).
Category filter chips: a row of colored chips lets you filter by category. These are driven by category labels on your tracks.
Track header row: sortable columns:
- ★ (favorite)
- Name
- Length
- Category
- Created
- Actions

Each track row shows:

A favorite button (★) – click to toggle favorite state.
The track name (based on the WAV filename).
Metadata: length and category (filled by the player scripts after they read the file).
A small trash icon to delete the file from disk.

Tip: You can use the header buttons to sort (e.g. by name or creation time), and use the category filter chips above the list to quickly narrow down to “lofi”, “battle”, “town”, etc.

4.3 Player controls

Below the track list you’ll find:

Time labels – current time and total duration.
Seek bar – click / drag to move around in the track.
Buttons: Rewind, Play, Stop, Loop, Mute.
Volume slider – global playback volume.

The underlying playback uses a hidden <audio id="audioPlayer"> element and a hidden <select id="trackList"> used by the JS logic to keep everything in sync.

4.4 Mode tabs: Generate vs Training

Beneath the player is a small tab strip:

Generate – the main text-to-music UI.
Training – LoRA training and dataset tools.

Only one mode is visible at a time. Behind the scenes, these tabs toggle cards with data-mode="generate" or data-mode="train" and the JS (e.g. cdmf_mode_ui.js) handles the details.

5. Generating music

5.1 Model status

At the top of the Generate Track card, you’ll see:

A Generate button.
A loading bar that animates while a generation is in progress.
A model status notice if ACE-Step isn’t downloaded yet. It will prompt you to click “Download Models” and warn that this is a large download.

5.2 Core vs Advanced tabs

The generation controls are split into:

Core – most of what you need most of the time.
Advanced – scheduler, CFG modes, repaint/extend, audio2audio, LoRA internals.

A good mental model: use the Core tab to get high-quality songs without touching anything you don’t understand. The Advanced tab is for experiments and fine-tuning once you’re comfortable.

5.3 Core controls

Base filename

Base filename (basename) is the prefix for your output WAV files. CDMF will append numbers / timestamps as needed so they don’t collide, but the base name is what you’ll see in the player.

Auto prompt / lyrics

The button “Generate prompt / lyrics…” opens a small modal where you can:

Describe a song concept (“melancholic SNES overworld at night…”).
Choose to generate:
- Prompt only
- Lyrics only
- Prompt + lyrics

CDMF uses an LLM backend to fill in the Genre / Style Prompt box and/or the Lyrics box based on your selection.

When Instrumental is checked, the dialog will default to Prompt only. When it’s unchecked, it leans toward Prompt + lyrics.

Genre / Style Prompt

This is your main ACE-Step prompt. Use it to describe:

Genre and instrument palette (e.g. “16-bit SNES snowfield, chiptune pads…”).
Tempo/mood (“slow, melancholic, wistful but hopeful”).
Context (“looping BGM for JRPG overworld”).

Instrumental vs Vocal presets

Below the prompt field are two preset groups:

Instrumental preset buttons shown when Instrumental is checked.
Vocal preset buttons shown when Instrumental is unchecked.

Each preset sets a bundle of internal knobs (target seconds, steps, guidance, etc.) and may tweak internal “seed vibes” for different sound families. The Random buttons pick from a curated list to keep exploring without you having to think too hard.

Instrumental toggle & lyrics

When Instrumental is checked:
- Lyrics are not used for generation.
- ACE-Step receives a special [inst] token so it focuses on backing tracks.
- The Lyrics box is hidden to keep the UI clean.
When Instrumental is unchecked:
- The Lyrics panel appears.
- You can paste or write lyrics with markers like [verse], [chorus], [solo], etc.

There’s also a Clear button inside the Lyrics row to quickly wipe the lyrics field.

Target length & fades

Target length (seconds) – slider + numeric box that tells ACE-Step roughly how long the track should be.
Fade in / Fade out (seconds) – small fades applied at the start/end of the final audio.

Tip: 0.5–2.0 seconds is a good fade range for most BGM tracks.

Core ACE-Step knobs

Inference steps – 50–125 is a good range. Higher is slower and may not always increase quality.
Guidance scale – how strongly ACE-Step follows your text. Extreme values can introduce noise.
BPM (optional) – if set, CDMF adds a hint like tempo 120 bpm to the tags.
Seed + Random checkbox:
- When Random is checked, CDMF picks a random seed each time.
- When unchecked, you can lock a specific seed to re-roll close variations.

Post-mix vocal / instrumental levels

At the end of the Core section you’ll see:

Vocals level (dB)
Instrumental level (dB)

These are post-process gain adjustments created by running the track through audio-separator and rebalancing stems.

Important: Using stem controls requires downloading a large stem separation model on first use and adds a heavy post-process step. For fastest iteration:

Generate a track at neutral levels (0 dB / 0 dB).
Find a track you like.
Turn off Random Seed and keep other settings the same.
Re-generate with adjusted vocal / instrumental gains.

5.4 Advanced tab (high-level)

The Advanced tab exposes more ACE-Step internals:

Scheduler type (Euler, Heun, ping-pong).
CFG mode (APG, CFG, CFG★) and related parameters.
ERG switches (tag, lyric, diffusion).
Repaint / extend:
- Task: text2music / retake / repaint / extend.
- Repaint start / end in seconds.
- Retake variance for variations.
Audio2Audio:
- Ref strength
- Ref audio file upload
- Optional explicit source path
LoRA adapter fields (more detail in the Training section, but you can:
- Pick installed LoRAs from a dropdown.
- Browse for a LoRA folder under custom_lora.
- Set the LoRA weight (0–10).

If you’re new to ACE-Step, you can ignore the Advanced tab entirely. The defaults were chosen to be safe and high quality out of the box.

5.5 Saved presets

At the bottom of the Generate card is a Saved presets block:

My presets dropdown – shows your saved presets after you create some.
Load – apply the selected preset to the current form.
Save – capture the current knobs as a new preset.
Delete – remove a preset.

Presets record both text fields (prompt, lyrics, etc.) and numerical fields (steps, seeds, gains, etc.), so you can quickly return to a particular “vibe kit” without screenshots or manual notes.

5.6 Output directory

The Output directory field controls where WAVs are written. It defaults to the path shown in the Music Player header. If you change this, remember that:

The player will look at the directory you pass in from the Flask view.
If you point it somewhere else, you may want to restart CDMF or refresh so the player sees it.

6. Vocal / instrumental stem control

CDMF integrates audio-separator so you can rebalance vocals and instrumentals after generation:

Vocals level (dB) – boosts or reduces the vocal stem relative to the original mix.
Instrumental level (dB) – boosts or reduces the backing track stem.

Both use decibel adjustments:

0 dB – leave as-is.
Negative values – make that stem quieter.
Positive values – make that stem louder.

On first use, CDMF will need to download the stem-separation model. This is large and adds a significant processing step. For quick sketching, leave both gains at 0 dB and only use stems once you’re close to a final track.

7. Training LoRAs

7.1 Training controls overview

Switch to the Training mode tab to see the LoRA controls:

Start Training – submits the training form to the backend and starts ACE-Step’s trainer.
Pause / Resume / Cancel – control an in-progress run. These are wired up to backend endpoints that can pause, resume, or stop training.
Status indicator – small banner and candystripe loading bar that reflect the current state.

Pausing saves a checkpoint and allows resuming later. If you restart the server, the paused state is preserved and you’ll be prompted to Resume or Cancel before starting a new run.

7.2 Dataset setup

The Dataset Setup / Formatting section describes how training datasets should be structured:

Your dataset folder must live under:
<CDMF root>\training_datasets
For each foo.mp3 (or foo.wav) you should have:
- foo_lyrics.txt – lyrics or [inst] for instrumentals.
- foo_prompt.txt – ACE-Step tags for that track.

The UI provides:

A Dataset folder text field.
A Browse… button that uses a hidden folder picker.

You can hand-create these files, use the Dataset Mass Tagging tool to generate them from a base prompt, or use MuFun-ACEStep to auto-tag.

7.3 Core LoRA training parameters

Experiment / adapter name – a short name like lofi_chiptunes_v1. Used for the output folder under ace_training and the final adapter under your custom_lora hierarchy.
LoRA config (JSON) – choose from JSON presets in the training_config folder. A hidden field keeps the selected path.
Max training steps – upper bound on optimisation steps. Usually left high; you control real run length with epochs.
Max epochs – number of full passes over the dataset (e.g. 20).
Learning rate – default 1e-4, with 1e-4–1e-5 being common LoRA values.
Max clip seconds – max length per audio example. Lowering this can reduce VRAM usage and speed up training.
SSL loss weight – weight for MERT/mHuBERT self-supervised losses. Set to 0 for pure instrumental / chiptune datasets.
Instrumental dataset – checkbox telling the trainer to freeze lyric/speaker-specific blocks and focus on music/texture layers.
Save LoRA every N steps – periodic checkpoint saving, with 0 disabling mid-run saves (but still writing a final adapter).

7.4 Advanced trainer settings

These map to PyTorch Lightning / ACE-Step trainer internals:

Precision – 32-bit, 16-mixed, or bf16-mixed.
Grad accumulation – virtual batch size multiplier.
Gradient clip value + algorithm – stability tuning.
DataLoader reload frequency – how often to rebuild loaders.
Validation check interval – how often validation runs.
Devices – number of GPUs to use (passed through to ACE-Step).

If you’re not already used to debugging Lightning configs, leave these at their defaults. You’ll get more mileage from good datasets and reasonable learning rates.

7.5 LoRA config help modal

The “LoRA config presets” help modal (triggered by the small ? button) explains the families of configs: light / medium / heavy, base_layers, extended_attn, heavy transformer, full_stack, etc. As a rule of thumb:

Light / base_layers – safest, smaller adapters, subtle style shaping.
Heavy / full_stack – much stronger imprinting and higher overfit risk.

8. Dataset mass-tagging tools

Under Training mode you’ll also see a card for Dataset Mass Tagging (Prompt / Lyrics Templates). This is for quickly building simple prompt/lyrics files without ML tagging.

8.1 Choosing the dataset folder

Use the Dataset folder field and Browse… button to point to a folder under:
<CDMF root>\training_datasets
Only .mp3 and .wav files in that folder will be affected.

8.2 Base tags

The Base tags field is a short ACE-Step prompt snippet written into each _prompt.txt. Example:

16-bit, 8-bit, SNES, retro RPG BGM, looping instrumental

8.3 Actions

Create prompt files – creates or updates _prompt.txt for each track using the base tags.
Create [inst] lyrics files – creates or updates _lyrics.txt files with just [inst].
Overwrite existing files – when checked, will overwrite existing prompt/lyrics files instead of skipping them.

A small status text and candystripe bar show when the tool is busy. Once complete, each track in the dataset should be ready to plug into the LoRA trainer.

9. MuFun-ACEStep analyzer (experimental)

The “Experimental – Analyze Dataset with MuFun-ACEStep” card lets you run a large MuFun model over a folder of audio to auto-generate prompts and lyrics.

9.1 Installing the MuFun model

Use the Install / Check button. CDMF will:
- Check whether the model is already present.
- Download it if needed into the CDMF models folder.
The model is large (tens of GB). Make sure you have enough disk space.

9.2 Running analysis

Select a dataset folder under training_datasets.
Optionally provide a Base tags string.
Optionally check Instrumental to force all lyrics to [inst].
Click Analyze Folder.

MuFun will:

Create _prompt.txt and _lyrics.txt files next to each track.
Include your base tags plus its own tags when writing prompts.
Show progress and results in the Results text area.

MuFun is powerful but not perfect. For high-stakes datasets, skim a few outputs and edit any bad tags or strange lyric outputs before training a LoRA.

10. Troubleshooting & FAQ

10.1 First launch is taking forever

Check the console window for pip errors (network, disk, permissions).
Ensure you have plenty of free disk space on the drive CDMF is installed on.
Slow networks will heavily impact model downloads.

10.2 “No .wav files found yet”

Generate a track first from the Generate tab.
Confirm that the Output directory field points to the same folder shown at the top of the Music Player.

10.3 GPU out-of-memory errors

Reduce Target length for generation.
Reduce Max clip seconds for training.
Lower batch / grad accumulation values if you’ve changed them.

10.4 Everything worked in dev but not in a fresh install

Compare the versions in your dev venv_ace with what’s in requirements_ace.txt in the installer payload.
If you had to pin or override certain packages (e.g. audio-separator, beartype, py3langid), make sure the bootstrap logic in CDMF.bat matches your dev environment.

10.5 Uninstalling CDMF

Use the standard Windows “Add/Remove Programs” entry or the uninstaller created by Inno Setup. The uninstaller is configured to remove the app folder under %LOCALAPPDATA%\CandyDungeonMusicForge, including the large venv_ace folder, so you don’t have to hunt it down manually.

If you keep a lot of generated music under the default output directory, consider backing up your .wav files before uninstalling.

Contents