Stable Diffusion 使用笔记

简介

Stable Diffusion 是一个开源的文本到图像生成模型，可以在本地运行，支持自定义训练和扩展。

安装与部署

以下是我安装和部署 Stable Diffusion 的方式：

使用 WebUI（推荐）

最简单的方式是使用 Automatic1111 的 WebUI：

# 克隆仓库
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui

# 安装依赖（Windows）
webui-user.bat

# 安装依赖（Linux/Mac）
./webui.sh

Docker 部署

docker run -it --gpus all -p 7860:7860 \
  -v $(pwd)/models:/app/models \
  -v $(pwd)/outputs:/app/outputs \
  stabilityai/stable-diffusion-2-1-base

基本使用

WebUI 界面

在文本框中输入 prompt
设置参数（采样步数、CFG Scale 等）
点击 Generate 生成图像

API 调用

import requests

url = "http://localhost:7860/sdapi/v1/txt2img"

payload = {
    "prompt": "a beautiful landscape, mountains, sunset, 4k",
    "negative_prompt": "blurry, low quality",
    "steps": 20,
    "cfg_scale": 7,
    "width": 512,
    "height": 512,
    "sampler_name": "Euler a"
}

response = requests.post(url, json=payload)
result = response.json()

# 保存图像
import base64
from PIL import Image
import io

image = Image.open(io.BytesIO(base64.b64decode(result['images'][0])))
image.save("output.png")

Prompt 编写

这是我在编写 Stable Diffusion Prompt 时积累的经验：

基础结构

[主体], [细节], [风格], [质量], [参数]

示例：

a cat sitting on a windowsill, sunlight, detailed fur, photorealistic, 8k, masterpiece

负面提示（Negative Prompt）

blurry, low quality, distorted, watermark, text, signature

常用质量标签

masterpiece, best quality - 高质量
highly detailed - 高细节
8k, 4k - 分辨率
sharp focus - 清晰对焦

模型管理

下载模型

从 Hugging Face 下载模型
放置到 models/Stable-diffusion/ 目录
在 WebUI 中刷新模型列表

LoRA 模型

LoRA 是轻量级的模型微调方式：

下载 LoRA 文件到 models/Lora/
在 prompt 中使用：<lora:model_name:0.8>
权重范围：0-1，建议 0.7-0.9

高级功能

ControlNet

ControlNet 可以精确控制生成图像的构图：

# 使用边缘检测控制
controlnet_args = {
    "controlnet_units": [{
        "input_image": edge_image,
        "module": "canny",
        "model": "control_canny-fp16",
        "weight": 1.0
    }]
}

Inpainting

局部重绘功能：

payload = {
    "prompt": "a red car",
    "init_images": [base64_image],
    "mask": base64_mask,
    "inpainting_fill": 1,
    "inpaint_full_res": True
}

Img2Img

图像到图像转换：

payload = {
    "prompt": "anime style",
    "init_images": [base64_image],
    "denoising_strength": 0.75
}

参数调优

采样器选择

Euler a: 快速，质量好（推荐）
DPM++ 2M Karras: 高质量，速度中等
DDIM: 稳定，适合 Img2Img

CFG Scale

7-9: 标准范围，平衡创意和准确性
10-12: 更严格遵循 prompt
<7: 更创意，但可能偏离 prompt

采样步数

20-30: 标准范围
50+: 高质量，但速度慢
<20: 快速，但可能质量下降

性能优化

使用 xFormers

pip install xformers

在启动参数中添加：--xformers

使用 TensorRT

对于 NVIDIA GPU，可以使用 TensorRT 加速：

# 安装 TensorRT
pip install nvidia-tensorrt

内存优化

# 低显存模式
--lowvram

# 中等显存模式
--medvram

自定义训练

训练 LoRA

# 使用 kohya_ss 训练脚本
python train_network.py \
  --pretrained_model_name_or_path=model.safetensors \
  --train_data_dir=./dataset \
  --output_dir=./output \
  --network_module=networks.lora

数据集准备

准备 10-50 张高质量图像
统一尺寸（512x512 或 768x768）
使用标签工具生成描述文件

常见问题

以下是我在使用 Stable Diffusion 时遇到的一些常见问题：

Q: 如何生成特定人物？

A: 使用 LoRA 或 Dreambooth 训练人物模型

Q: 生成速度慢怎么办？

A: 使用 xFormers、降低分辨率、减少采样步数

Q: 如何提高生成质量？

A: 优化 prompt、使用高质量模型、增加采样步数、调整 CFG Scale

最佳实践

Prompt 工程：详细描述，使用质量标签
模型选择：根据需求选择合适的模型
参数调优：根据结果调整参数
批量生成：使用脚本批量生成并筛选
版本管理：记录好的 prompt 和参数组合