Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
6.5.1
metadata
title: ACE-Step 1.5 Custom Edition
emoji: 🎵
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: mit
python_version: '3.11'
hardware: zero-gpu-medium
ACE-Step 1.5 Custom Edition
A fully-featured implementation of ACE-Step 1.5 with custom GUI and workflow capabilities for local use and HuggingFace Space deployment.
Features
🎵 Three Main Interfaces
- Standard ACE-Step GUI: Full-featured standard ACE-Step 1.5 interface with all original capabilities
- Custom Timeline Workflow: Advanced timeline-based generation with:
- 32-second clip generation (2s lead-in + 28s main + 2s lead-out)
- Seamless clip blending for continuous music
- Context Length slider (0-120 seconds) for style guidance
- Master timeline with extend, inpaint, and remix capabilities
- LoRA Training Studio: Complete LoRA training interface with:
- Audio file upload and preprocessing
- Custom training configuration
- Model download/upload for continued training
Architecture
- Base Model: ACE-Step v1.5 Turbo
- Framework: Gradio 5.9.1, PyTorch
- Deployment: Local execution + HuggingFace Spaces
- Audio Processing: DiT + VAE + 5Hz Language Model
Installation
Local Setup
# Clone the repository
git clone https://github.com/yourusername/ace-step-custom.git
cd ace-step-custom
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download ACE-Step model
python scripts/download_model.py
# Run the application
python app.py
HuggingFace Space Deployment
- Create a new Space on HuggingFace
- Upload all files to the Space
- Set Space to use GPU (recommended: H200 or A100)
- The app will automatically download models and start
Usage
Standard Mode
Use the first tab for standard ACE-Step generation with all original features.
Timeline Mode
- Enter your prompt/lyrics
- Adjust Context Length (how far back to reference previous clips)
- Click "Generate" to create 32-second clips
- Clips automatically blend and add to timeline
- Use "Extend" to continue the song or other options for variations
LoRA Training
- Upload audio files for training
- Configure training parameters
- Train custom LoRA models
- Download and reuse for continued training
System Requirements
Minimum
- GPU: 8GB VRAM (with optimizations)
- RAM: 16GB
- Storage: 20GB
Recommended
- GPU: 16GB+ VRAM (A100, H200, or consumer GPUs)
- RAM: 32GB
- Storage: 50GB
Technical Details
- Audio Format: 48kHz, stereo
- Generation Speed: ~8 inference steps (turbo model)
- Context Window: Up to 120 seconds for style guidance
- Blend Regions: 2-second crossfade between clips
Credits
Based on ACE-Step 1.5 by ACE Studio
- GitHub: https://github.com/ace-step/ACE-Step-1.5
- Original Demo: https://huggingface.co/spaces/ACE-Step/ACE-Step
License
MIT License (see LICENSE file)