释放双眼，带上耳机，听听看~！

This article discusses the improvements made to Denoising Diffusion Probabilistic Models (DDPM) in terms of log-likelihood and noise schedule, highlighting their competitive performance on high-diversity datasets like ImageNet.

这个文章是关于论文《Improved Denoising Diffusion Probabilistic Models》的论文解读。

Improving the Log-likelihood

In this paper, we show that DDPMs can achieve loglikelihoods competitive with other likelihood-based models, even on high-diversity datasets like ImageNet.

写作动机就是：虽然发现DDPMs可以根据FID和Inception Score生成高保真度的样本，但是这些模型无法实现具有竞争力的对数似然。

对数似然是生成式建模中广泛使用的度量，一般认为优化对数似然迫使生成式模型捕获数据分布的所有模式。

所以这个工作的目的就是提高扩散模型的对数似然。

0 提高T

作者发现将T加到4000可以提升对数似然，本节中剩余部分都是使用 $T = 4000$

1 Learning $Σ_θ(x_t, t)$

Improved Denoising Diffusion Probabilistic Models: Enhancing Log-likelihood and Noise Schedule

起初 $β~ttilde{beta}_t$ 非常小，随着扩散步数的增加， $β_t$ 和 $β~ttilde{beta}_t$ 几乎相等。
表明，随着扩散步骤增加 $σ_t$ 的选择对样本质量可能根本无关紧要。也就是说，随着我们增加更多的扩散步长，模型均值 $μ_θ ( x_t , t)$ 比 $Σ_θ ( x_t , t)$ 更能决定分布。

图2表明扩散过程的前几步对变分下界的贡献最大。因此，似乎可以通过更好地选择 $Σ_θ ( x_t , t)$ 来提高对数似然值。

对数域内将方差参数化为 $βt{beta}_t$ 和 $β~ttilde{beta}_t$ 之间的插值更好。
模型输出一个每维包含一个分量的向量 $v$ ，我们将这个输出转化为方差如下：
$Σθ(xt,t)=exp⁡(vlog⁡βt+(1−v)log⁡β~t)Sigma_thetaleft(x_t, tright)=exp left(v log beta_t+(1-v) log tilde{beta}_tright)$

Since $L_{text {simple }}$ doesn’t depend on $Σθ(xt,t)Sigma_thetaleft(x_t, tright)$ , we define a new hybrid objective:
$+λLvlbL_{text {hybrid }}=L_{text {simple }}+lambda L_{mathrm{vlb}}$

2 Improving the Noise Schedule

Improved Denoising Diffusion Probabilistic Models: Enhancing Log-likelihood and Noise Schedule

线性噪声调度对高分辨率图像效果较好，但对于分辨率为64 × 64和32 × 32的图像效果欠佳。并且在这些分辨率下，前向加噪过程的末尾噪声太大，因此对样本质量的贡献不大。

Improved Denoising Diffusion Probabilistic Models: Enhancing Log-likelihood and Noise Schedule

图四：反向过程跳步达到20%的时候对FID也不会有什么明显影响。

综上，本文 $α_t$ 构造了一个不同的噪声调度：
$αˉt=f(t)f(0),f(t)=cos⁡(t/T+s1+s⋅π2)2bar{alpha}_t=frac{f(t)}{f(0)}, quad f(t)=cos left(frac{t / T+s}{1+s} cdot frac{pi}{2}right)^2$
$βt=1−αˉtαˉt−1beta_t=1-frac{bar{alpha}_t}{bar{alpha}_{t-1}}$
限制 $βt≤0.999β_tleq0.999$ ，以防止在扩散过程接近 $t = T$ 时出现奇怪现象。

Improved Denoising Diffusion Probabilistic Models: Enhancing Log-likelihood and Noise Schedule

图5可以看出，线性方差表更快地将图片破坏为噪声，但是cos方差表在 $t = 0$ 和 $t = T$ 附近变化很小，以防止噪声水平的突然变化。

开始时噪声太小会让网络难以准确地预测 $ϵ$ ，因此使用一个小的偏移量 $s = 0.008$ ，以防止在 $t = 0$ 附近 $β_t$ 太小。

用 $cos^2$ 纯属巧合，作者想要一个两端平滑中间线性下降的函数，换成别的能work也可以。

3 Reducing Gradient Noise

理论上直接优化 $L_{vlb}$ 可以获得更好地对数似然。但是下图可以看到，实际中 $L_{vlb}$ 很难优化，甚至 $L_{hyvrid}$ 获得更好的对数似然。但是他们俩噪声都很大。

Improved Denoising Diffusion Probabilistic Models: Enhancing Log-likelihood and Noise Schedule

直接优化 $L_{vlb}$ ，但是要寻找一种降低其方差的方法。因为图2中不同项差的量级很大，作者猜测是因为时间步均匀采样导致的梯度噪声大，所以提出了重要性采样的方法：
$∑pt=1L_{mathrm{vlb}}=E_{t sim p_t}left[frac{L_t}{p_t}right] text {, where } p_t propto sqrt{Eleft[L_t^2right]} text { and } sum p_t=1$

由于 $E[L_t^2]$ 无法获取准确值, 所以保存每个时间步前 10 次的损失求平均来估计, 这样损失越大的时间步采样频率越低, 从而整体上可以保证损失的稳定性。

使用重要性采样单独训练 $L_{vlb}$ 确实有效，损失也更稳定了。直接训练 $L_{hybrid}$ 的损失也可以降到和使用重要性采样的 $L_{vlb}$ 差不多, ~~损失曲线稍微不稳定一点，所以可以自由选择。~~

（来自其他4

个人觉得第二句话有问题。重要性采样对于 $L_{vlb}$ 确实有效，但是对于 $L_{hybrid}$ 原文只提到：

We found that the importance sampling technique was not helpful when optimizing the less-noisy Lhybrid objective directly.

就是对噪声较小的混合目标进行优化时候没有用。

但是对噪声大的地方应该也是有用的，所以重要性采样对 $L_{hybrid}$ 也是有效的.

所以训练的时候可以使用 $L_{hybrid}$ 或者 $L_{vlb}$ 。

Improving Sampling Speed

Improved Denoising Diffusion Probabilistic Models: Enhancing Log-likelihood and Noise Schedule

we evaluate FIDs for an $L_{hybrid}$ model and an $L_{simple}$ model that were trained with 4000 steps, using 25, 50, 100, 200, 400, 1000, and 4000 sampling steps.

$L_{hybrid}$ model with learnt sigmas maintains high sample quality. With this model, 100 sampling steps is sufficient to achieve near-optimal FIDs for our fully trained models.

其他

本网站的内容主要来自互联网上的各种资源，仅供参考和信息分享之用，不代表本网站拥有相关版权或知识产权。如您认为内容侵犯您的权益，请联系我们，我们将尽快采取行动，包括删除或更正。

{{userData.name}}已认证

Improved Denoising Diffusion Probabilistic Models: Enhancing Log-likelihood and Noise Schedule

Improving the Log-likelihood

0 提高T

1 Learning $Σ_θ(x_t, t)$

2 Improving the Noise Schedule

3 Reducing Gradient Noise

Improving Sampling Speed

其他

对话记忆：聊天机器人的交互关键

布尔型索引、集合运算和排序

GeoSpy.ai

Globe Explorer

即梦Dreamina

Luma Dream Machine

Motionshop

Kling AI | Sora-Like Video Model

归档

{{userData.name}}已认证

Improving the Log-likelihood

0 提高T

1 Learning Σθ(xt,t)Σ_θ(x_t, t)Σθ​(xt​,t)

2 Improving the Noise Schedule

3 Reducing Gradient Noise

Improving Sampling Speed

其他

对话记忆：聊天机器人的交互关键

布尔型索引、集合运算和排序

Stable Diffusion: A Deep Learning Model Explained

GPT原理与使用技巧

如何选择ChatGPT API方式？比较ChatGPT API和ChatGPT Unofficial ProxyAPI

Meta发布Llama 2开源大动作，AI领域再次掀起风暴

1 Learning $Σ_θ(x_t, t)$