01=Stable Diffusion简介
在当下人工智能的快速发展,Stable Diffusion作为一种先进的基于深度学习的图像生成技术,在图像处理、深度学习技术和开源等方面具有显著优势,适合不同应用场景的需求
Stable Diffusion是Diffusion扩散模型中的最新版本,由CompVis、Stability AI和LAION的研究人员在Latent Diffusion Model的基础上创建并推出,于2022年8月由Stability AI公司正式发布。它采用了更加稳定、可控和高效的方法来生成高质量图像,展示了多模态领域中,如何通过深度学习将文字信息转化为视觉内容的前沿技术。
Stable Diffusion使用条件变分自编码器(Conditional Variational Autoencoder)来生成图像,该结构可以将输入的文本或图像的条件分布转换为隐变量的条件分布,使得生成的图像更加符合输入的条件分布。同时,Stable Diffusion还结合了生成对抗网络(GANs)的思想,通过训练一个生成器和一个判别器,不断竞争和学习,从而生成越来越逼真的图像。
Stable Diffusion的工作流程细致地将文本转换为图像,涵盖从文本解析到图像细化的各个阶段。以下是该过程的详细步骤:
02=安装前准备
在安装SD之前,我们需要先安装Python,推荐使用python 3.10以上版本。
1、先下载Python软件,选择下载想要安装的版本Python版本
https://www.python.org/downloads/
2、双击安装包安装该版本软件,勾选添加值环境变量。
3、安装过程...(略)
4、安装成功后,查看版本信息
03=
安装Stable Diffusion
1、下载Stable Diffusion软件
git clone https://github.com/Stability-AI/stablediffusion.git
github项目截图:
项目文件夹展示:
2、安装依赖库:
安装pip工具:
python -m pip install --upgrade pip
安装依赖库:采用安装命令pip install albumentations==0.4.3依次安装如下软件库。
albumentations==0.4.3
opencv-python
pudb==2019.2
imageio==2.9.0
imageio-ffmpeg==0.4.2
pytorch-lightning==1.4.2
torchmetrics==0.6
omegaconf==2.1.1
test-tube>=0.7.5
streamlit>=0.73.1
einops==0.3.0
transformers==4.19.2
webdataset==0.2.5
open-clip-torch==2.7.0
gradio==3.13.2
kornia==0.6
invisible-watermark>=0.1.5
streamlit-drawable-canvas==0.8.0
批量安装方法:
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
下载预训练模型v2-1_768-ema-pruned.ckpt,并存放到checkpoints文件夹中。下载网址(https://huggingface.co/)
04=
运行Stable Diffusion
1、运行Stable Diffusion软件
python ./scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt ./checkpoints/v2-1_768-ema-pruned.ckpt --config ./configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768
出现问题:ModuleNotFoundError: No module named 'ldm'
F:\StableDiffusion\stablediffusion>
F:\StableDiffusion\stablediffusion>python ./scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt ./checkpoints/v2-1_768-ema-pruned.ckpt --config ./configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768
Traceback (most recent call last):
File "F:\StableDiffusion\stablediffusion\scripts\txt2img.py", line 16, in <module>
from ldm.util import instantiate_from_config
ModuleNotFoundError: No module named 'ldm'
解决方法:将ldm复制或者创建软连接到scripts文件夹路径下
继续执行运行Stable Diffusion指令
F:\StableDiffusion\stablediffusion>python ./scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt ./checkpoints/v2-1_768-ema-pruned.ckpt --config ./configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768
F:\StableDiffusion\stablediffusion\scripts\ldm\models\diffusion\dpm_solver\dpm_solver.py:16: SyntaxWarning: invalid escape sequence '\h'
"""Create a wrapper class for the forward SDE (VP type).
Global seed set to 42
Loading model from ./checkpoints/v2-1_768-ema-pruned.ckpt
F:\StableDiffusion\stablediffusion\scripts\txt2img.py:30: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
pl_sd = torch.load(ckpt, map_location="cpu")
Global Step: 110000
No module 'xformers'. Proceeding without it.
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Downloading: "https://github.com/DagnyT/hardnet/raw/master/pretrained/train_liberty_with_aug/checkpoint_liberty_with_aug.pth" to C:\Users\fangt/.cache\torch\hub\checkpoints\checkpoint_liberty_with_aug.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.10M/5.10M [00:00<00:00, 9.56MB/s]
如果执行过程中遇到空间问题:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.49 GiB. GPU 0 has a total capacty of 11.99 GiB of which 0 bytes is free. Of the allocated memory 14.77 GiB is allocated by PyTorch, and 9.52 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
解决方案:调整生成大小为H:512 W:512 来节约空间。--H 512 --W 512
F:\StableDiffusion\stablediffusion>python ./scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt ./checkpoints/v2-1_768-ema-pruned.ckpt --config ./configs/stable-diffusion/v2-inference-v.yaml --H 512 --W 512
小伙伴们,本期介绍到此为止,下期我们将继续介绍windows 10 环境下部署和使用Stable Diffusion的相关文章,谢谢!
(正文完)