深入浅出 VAE 和 CVAE 模型

常见模型深入浅出 - 系列文章

§ : 本文

§ : 一些数学基础概念

0. 写在开头
#

在 ACT 模型中，我们提到了 VAE 和 CVAE。本文对这两类生成模型做一个直观梳理。

VAE：把自编码器放进概率图景，用变分推断与重参数化技巧把“学习可采样的潜变量分布”转成可微的重建项 + KL 正则，从而既能压缩又能生成。
CVAE：在 VAE 的潜变量与/或解码器中显式加入条件 $c$，使生成分布变为 $p_\theta(x\mid z,c)$，从而实现受控生成（按标签、文本等定向出样本）。

AE（Autoencoder，自编码器）
#

结构：编码器将输入 $x$ 映射为确定性的潜在向量 $z=f_\phi(x)$；解码器重建 $\hat x=g_\theta(z)$。
训练目标：最小化重建误差（无先验与 $\mathrm{KL}$ 正则）。常见损失：
$$ \mathcal{L}_{\mathrm{AE}}(\theta,\phi) = \mathbb{E}_{x\sim\mathcal{D}}\big[\ell\!\left(x,\; g_\theta\!\big(f_\phi(x)\big)\right)\big], $$
- 其中 $\ell$ 可取均方误差（MSE）或交叉熵等。
作用：学习有效表示与压缩；但潜在空间往往不连续、难以平滑采样生成。

VAE（Variational Autoencoder，变分自编码器）
#

核心想法：让编码器输出分布参数（如均值 $\mu_\phi(x)$、方差 $\sigma_\phi^2(x)$），定义近似后验 $q_\phi(z\mid x)=\mathcal N\!\big(\mu_\phi(x),\operatorname{diag}(\sigma_\phi^2(x))\big)$，再从中采样 $z$ 送入解码器。
重参数化技巧：为使采样可反传，写作 $$ z=\mu_\phi(x)+\sigma_\phi(x)\odot\varepsilon,\quad \varepsilon\sim\mathcal N(0,I). $$
目标函数（最大化 ELBO，等价于最小化负 ELBO）： $$ \mathcal L_{\mathrm{VAE}}(\theta,\phi) = \mathbb{E}_{q_\phi(z\mid x)}\!\left[\log p_\theta(x\mid z)\right] - \mathrm{KL}\!\left(q_\phi(z\mid x)\,\|\,p(z)\right) $$
- 其中先验常取 $p(z)=\mathcal N(0,I)$。
- 第一项是重建项，鼓励复原输入；
- 第二项是正则项，将近似后验拉向先验，使潜在空间平滑、可插值、可采样。
优点：潜在空间连续、可控插值与随机采样生成新样本。
局限：基本 VAE 是无条件生成，难以指定生成“某一类”样本；且重建往往偏模糊（对像素级似然的高斯假设等有关）。

VAE 数学推导
#

设有 $N$ 个独立同分布样本

$$ X=(x^1,x^2,\cdots,x^N). $$

我们的目标是对生成模型的参数 $\theta$ 进行极大似然估计。似然写为 $p_\theta(X)$。在 i.i.d. 假设下，有

$$ \log p_\theta(X)=\sum_{i=1}^N \log p_\theta(x^i). $$

先考察单个样本 $x$ 的对数似然。对于任意潜变量 $z$，恒等式

$$ \log p_\theta(x)=\log \frac{p_\theta(x,z)}{p_\theta(z\mid x)} $$

成立（当 $p_\theta(z\mid x)>0$ 时）。

引入变分分布 $q_\phi(z\mid x)$（用以近似难以求解的后验 $p_\theta(z\mid x)$）。对上述等式两侧以 $q_\phi(z\mid x)$ 积分：

左边

$$ \int q_\phi(z\mid x)\,\log p_\theta(x)\,dz=\log p_\theta(x), $$

右边

$$ \begin{aligned} \int q_\phi(z\mid x)\,\log \frac{p_\theta(x,z)}{p_\theta(z\mid x)}\,dz &=\int q_\phi(z\mid x)\,\log \frac{p_\theta(x,z)}{q_\phi(z\mid x)}\,dz \;+\;\mathrm{KL}\!\left(q_\phi(z\mid x)\,\|\,p_\theta(z\mid x)\right). \end{aligned} $$

因此得到

$$ \log p_\theta(x) =\underbrace{\int q_\phi(z\mid x)\,\log \frac{p_\theta(x,z)}{q_\phi(z\mid x)}\,dz}_{\text{ELBO}} +\mathrm{KL}\!\left(q_\phi(z\mid x)\,\|\,p_\theta(z\mid x)\right). $$

由 $\mathrm{KL}\ge 0$ 可知，下界（ELBO）满足

$$ \log p_\theta(x)\;\ge\;\mathcal{L}(\theta,\phi;x) :=\int q_\phi(z\mid x)\,\log \frac{p_\theta(x,z)}{q_\phi(z\mid x)}\,dz. $$

将联合分布按链式分解 $p_\theta(x,z)=p_\theta(x\mid z)\,p(z)$，并对 ELBO 重写为标准形式：

$$ \begin{aligned} \mathcal{L}(\theta,\phi;x) &=\mathbb{E}_{z\sim q_\phi(z\mid x)}\big[\log p_\theta(x\mid z)\big] -\mathrm{KL}\!\left(q_\phi(z\mid x)\,\|\,p(z)\right). \end{aligned} $$

其中第一项是重构项（期望对数似然），第二项是将近似后验拉向先验的正则项。

对整批样本，目标为

$$ \max_{\theta,\phi}\;\sum_{i=1}^N \mathcal{L}(\theta,\phi;x^i) =\max_{\theta,\phi}\;\sum_{i=1}^N\left( \mathbb{E}_{z\sim q_\phi(z\mid x^i)}\big[\log p_\theta(x^i\mid z)\big] -\mathrm{KL}\!\left(q_\phi(z\mid x^i)\,\|\,p(z)\right)\right). $$

由于 $p_\theta(z\mid x)$ 难以直接计算，常用的做法是直接最大化 ELBO（等价于在固定 $x$ 与 $\theta$ 时最小化 $\mathrm{KL}\!\left(q_\phi(z\mid x)\,\|\,p_\theta(z\mid x)\right)$，因为 $\log p_\theta(x)$ 为常数）。优化方式上既可以联合对 $(\theta,\phi)$ 做梯度上升，也可以采用变分 EM 的交替策略：

步骤 ①：固定 $\theta$，更新 $\phi$，以最大化 $\mathcal{L}(\theta,\phi;x)$；

步骤 ②：固定 $\phi$，更新 $\theta$，以最大化 $\mathcal{L}(\theta,\phi;x)$。

直观上，第一项 $\mathbb{E}_{z\sim q_\phi(z\mid x)}[\log p_\theta(x\mid z)]$ 可理解为从 $q_\phi(z\mid x)$ 采样 $z$ 后的重构对数似然的期望，最大化它等价于最小化重构误差；第二项 $-\mathrm{KL}\!\left(q_\phi(z\mid x)\,\|\,p(z)\right)$ 鼓励近似后验接近先验。

细化目标函数
#

证据下界（ELBO）写作：

$$ \mathcal{L}(x) = \mathbb{E}_{q_\phi(z|x)}\![\log p_\theta(x|z)]- D_{KL}\!\bigl(q_\phi(z|x)\,\|\,p(z)\bigr). $$

最大化 $\mathcal{L}(x)$ 等价于最小化第二项的 $D_{KL}$。取先验 $p(z)=\mathcal{N}(0,I)$，并设 $q_\phi(z|x)=\mathcal{N}\!\bigl(\mu_\phi(x),\operatorname{diag}(\sigma^2_\phi(x))\bigr)$，即各维独立的高斯近似。

对单个维度 $z_j$，记 $q(z_j|x)=\mathcal{N}(\mu,\sigma^2)$、$p(z_j)=\mathcal{N}(0,1)$（为简洁省略下标与参数），有

$$ \begin{aligned} D_{KL}\bigl(q(z_j|x)\,\|\,p(z_j)\bigr) &= \int q(z_j|x)\,\log\frac{q(z_j|x)}{p(z_j)}\,dz_j \\ &= \int q(z_j|x)\,\log \frac{\tfrac{1}{\sqrt{2\pi}\sigma}\exp\!\left(-\tfrac{(z_j-\mu)^2}{2\sigma^2}\right)} {\tfrac{1}{\sqrt{2\pi}}\exp\!\left(-\tfrac{z_j^2}{2}\right)}\,dz_j \\ &= \frac{1}{2}\int q(z_j|x) \left( z_j^2 - \log\sigma^2 - \frac{(z_j-\mu)^2}{\sigma^2} \right) dz_j. \end{aligned} $$

将期望分解为三项并分别计算：

$$ \int q(z_j|x)\,z_j^2\,dz_j=\mathbb{E}[z_j^2] =\operatorname{Var}(z_j)+\mathbb{E}[z_j]^2=\sigma^2+\mu^2, $$$$ \int q(z_j|x)\,\log\sigma^2\,dz_j=\log\sigma^2, $$$$ \int q(z_j|x)\,\frac{(z_j-\mu)^2}{\sigma^2}\,dz_j =\frac{1}{\sigma^2}\,\mathbb{E}\!\left[(z_j-\mu)^2\right]=1. $$

故单维的 KL 为

$$ D_{KL}\bigl(q(z_j|x)\,\|\,p(z_j)\bigr) =\frac{1}{2}\left(\mu^2+\sigma^2-\log\sigma^2-1\right). $$

在 $J$ 维独立假设下，对所有维度求和可得

$$ D_{KL}\bigl(q_\phi(z|x)\,\|\,p(z)\bigr) =\frac{1}{2}\sum_{j=1}^{J}\left(\mu_j^2+\sigma_j^2-\log\sigma_j^2-1\right). $$

因此，最大化 $\mathcal{L}(x)$ 等价于最小化上式；这也是 VAE 训练中常用的闭式 KL 项。

最小重构代价
#

给定单个样本 $x$，重构项定义为

$$ \mathbb{E}_{z\sim q_\phi(z|x)}\!\left[\log p_\theta(x|z)\right] = \int \log p_\theta(x|z)\, q_\phi(z|x)\,dz \approx \frac{1}{n}\sum_{i=1}^{n} \log p_\theta\!\left(x\,\big|\,z^{(i)}\right), $$

其中 $z^{(i)}\sim q_\phi(z|x)$ 是蒙特卡罗采样。

常见的假设是 $p_\theta(x|z)=\mathcal N(\mu_\theta(z),\sigma_\theta^2(z)I)$。更简单地，也可固定协方差为常数 $c>0$：

$$ p_\theta(x|z)=\mathcal N\!\left(f_\theta(z),\,cI\right). $$

此时

$$ \log p_\theta(x|z) = -\frac{D}{2}\log(2\pi) - \frac{D}{2}\log c - \frac{1}{2c}\,\|x-f_\theta(z)\|^2, $$

因此最大化重构项等价于最小化均方误差的期望（忽略与 $\theta,\phi$ 无关的常数）：

$$ \max\,\mathbb{E}_{q_\phi}\!\left[\log p_\theta(x|z)\right] \ \Longleftrightarrow\ \min\,\frac{1}{2c}\,\mathbb{E}_{q_\phi}\!\left[\|x-f_\theta(z)\|^2\right] \ \approx\ \min\,\frac{1}{2c}\cdot\frac{1}{n}\sum_{i=1}^{n}\big\|x-f_\theta\!\left(z^{(i)}\right)\big\|^2. $$

将其与 KL 项合并（取先验 $p(z)=\mathcal N(0,I)$，且 $q_\phi(z|x)=\mathcal N\!\big(\mu(x),\operatorname{diag}\sigma^2(x)\big)$），得到单样本的负 ELBO（待最小化）：

$$ \min_{\theta,\phi}\ \left[ \underbrace{\frac{1}{2}\sum_{j=1}^{J}\!\left(\sigma_j^2+\mu_j^2-\log\sigma_j^2-1\right)}_{\text{KL}\big(q_\phi(z|x)\,\|\,\mathcal N(0,I)\big)} \ +\ \underbrace{\frac{1}{2c}\cdot\frac{1}{n}\sum_{i=1}^{n}\big\|x-f_\theta\!\left(z^{(i)}\right)\big\|^2}_{-\mathbb E_{q_\phi}\![\log p_\theta(x|z)]} \right]. $$

关于可导性：直接对 $z$ 采样使得对编码器参数 $\phi$ 的梯度不可传递。为此使用重参数化技巧

$$ z=\mu_\phi(x)+\sigma_\phi(x)\odot\varepsilon,\qquad \varepsilon\sim\mathcal N(0,I), $$

从而梯度可经由 $\varepsilon$ 的确定性变换传回编码器。

重参数化技巧
#

假定 $q(z\mid x)=\mathcal N(\mu,\sigma^2)$，其中 $\mu=\mu(x)$、$\sigma=\sigma(x)>0$。令 $\epsilon\sim\mathcal N(0,1)$ 且与 $x$ 独立，定义重参数化

$$ z=\mu+\sigma\,\epsilon. $$

于是，只需先从 $\epsilon\sim\mathcal N(0,1)$ 采样，再代入上式，就得到了等价于从 $q(z\mid x)$ 采样的样本。

证明（条件于 $x$）：

$$ \mathbb E[z\mid x]=\mu+\sigma\,\mathbb E[\epsilon]=\mu, $$$$ \mathrm{Var}(z\mid x)=\mathbb E\big[(z-\mu)^2\mid x\big] =\mathbb E\big[(\sigma\epsilon)^2\mid x\big] =\sigma^2\,\mathbb E[\epsilon^2]=\sigma^2. $$

模型图
#

%%{init: {'theme':'base','themeVariables':{ 'background':'#E8F3FF', 'primaryColor':'#FFFFFF', 'primaryBorderColor':'#3B82F6', 'primaryTextColor':'#0F172A', 'lineColor':'#2563EB', 'fontFamily':'Inter, Arial' }}}%% flowchart LR %% --- Classes & link style --- classDef node fill:#FFFFFF,stroke:#3B82F6,stroke-width:1.5px,rx:10px,ry:10px,color:#0F172A; classDef soft fill:#F0F7FF,stroke:#60A5FA,stroke-width:1.5px,rx:10px,ry:10px,color:#0F172A; linkStyle default stroke:#2563EB,stroke-width:2px; %% --- Encoder --- subgraph ENCGRP["编码器 $q_φ(z|x)$"] X["训练样本 $x$"] ENC["神经网络"] MU["均值 $μ_φ(x)$"] LOGVAR["对数方差 $log σ_φ²(x)$"] X --> ENC ENC --> MU ENC --> LOGVAR end MU -- "重参数化" --> Z(("隐变量 $z$")) LOGVAR -- "重参数化" --> Z %% --- Decoder --- subgraph DECGRP["解码器 $p_θ(x|z)$"] Z --> DEC["神经网络 $f_θ$"] DEC --> XHAT["重构 $x̂$"] end %% --- Apply styles --- class X,ENC,MU,LOGVAR,DEC,XHAT node; class Z soft; style ENCGRP fill:#E6F0FF,stroke:#60A5FA,stroke-width:2px; style DECGRP fill:#E6F0FF,stroke:#60A5FA,stroke-width:2px;

$$ \text{ELBO}(\theta,\phi;x) = \mathbb{E}_{q_\phi(z\mid x)}\!\big[\log p_\theta(x\mid z)\big]- \mathrm{KL}\!\big(q_\phi(z\mid x)\,\|\,p(z)\big). $$

最大化 ELBO 等价于最小化其相反数（常作为训练损失）：

$$ \min_{\theta,\phi}\;\mathcal{L}(x)= \underbrace{-\,\mathbb{E}_{q_\phi(z\mid x)}\!\big[\log p_\theta(x\mid z)\big]}_{\text{重构损失}}+ \underbrace{\mathrm{KL}\!\big(q_\phi(z\mid x)\,\|\,\mathcal{N}(0,I)\big)}_{\text{先验对齐}}. $$

当采用各向异性高斯后验近似 $q*\phi(z\mid x)=\mathcal N\!\big(\mu*\phi(x),\mathrm{diag}(\sigma\_\phi^2(x))\big)$ 且先验 $p(z)=\mathcal N(0,I)$ 时，KL 的闭式解为

$$ \mathrm{KL}\!\big(q_\phi(z\mid x)\,\|\,\mathcal N(0,I)\big) = \frac{1}{2}\sum_{j=1}^{J}\Big(\sigma_j^2+\mu_j^2-\log\sigma_j^2-1\Big), $$

其中 $\mu*j,\sigma_j$ 为 $\mu*\phi(x),\sigma*\phi(x)$ 的分量。实践中常直接预测 $\log\sigma*\phi^2(x)$。

训练阶段（learning）
#

编码器 $q\_{\phi}(z\mid x)$ 输入样本 $x$，神经网络输出近似后验参数：
$$ \mu_\phi(x),\;\log\sigma_\phi^2(x). $$
由此得到 $\sigma*\phi(x)=\exp\!\big(\tfrac{1}{2}\log\sigma*\phi^2(x)\big)$。
重参数化采样（可导）从标准正态采样 $\epsilon \sim \mathcal N(0, I)$，构造
$$ z=\mu_\phi(x)+\sigma_\phi(x)\odot \epsilon, $$
使得梯度可经由 $z$ 传回到 $\phi$。
解码器 $p\_{\theta}(x\mid z)$ 将 $z$ 输入解码器 $f*\theta$，得到重构 $\hat x=f*\theta(z)$，对应似然 $p*\theta(x\mid z)$。
损失函数（最小化）
$$ \mathcal L(x) = -\,\mathbb E_{q_\phi(z\mid x)}[\log p_\theta(x\mid z)] + \mathrm{KL}\!\big(q_\phi(z\mid x)\,\|\,\mathcal N(0,I)\big) $$
- 若建模为伯努利似然 $p\_\theta(x\mid z)=\mathrm{Bernoulli}(\hat x)$（图像已归一化至 $[0,1]$），重构项对应 BCE。
- 若建模为各向同性高斯 $p\_\theta(x\mid z)=\mathcal N(\hat x,\sigma_x^2 I)$（常取固定 $\sigma_x$），重构项等价于 MSE（差一个常数因子）。
优化对 $(\phi,\theta)$ 进行反向传播与参数更新，直至收敛。

生成阶段（inference）
#

直接从先验采样 $z \sim \mathcal N(0,I)$，送入解码器 $f*\theta$ 得到新样本 $\tilde x=f*\theta(z)$。

CVAE（Conditional VAE，条件变分自编码器）
#

条件化思路：引入条件 $c$（如类别、文本特征），并将其注入编码器与解码器：
- 编码器条件化：$\text{Encoder}(x,c)\to(\mu_z,\sigma_z)$，即 $q_\phi(z\mid x,c)$；
- 解码器条件化：$\text{Decoder}(z,c)\to \hat x$，即使用 $p_\theta(x\mid z,c)$。
目标函数（条件 ELBO）： $$ \mathcal L_{\mathrm{CVAE}}(\theta,\phi) = \mathbb{E}_{q_\phi(z\mid x,c)}\!\left[\log p_\theta(x\mid z,c)\right] - \mathrm{KL}\!\left(q_\phi(z\mid x,c)\,\|\,p(z\mid c)\right) $$ 实践中常取 $p(z\mid c)=\mathcal N(0,I)$（与 $c$ 无关）以简化。
效果：通过在生成阶段指定 $c$，即可按条件受控生成（例如手写体指定数字“7”）。

CVAE 数学推导
#

设生成模型为 $p_\theta(x\mid z,c)$，先验为 $p_\theta(z\mid c)$，变分分布为 $q_\phi(z\mid x,c)$。由

$$ p(x,z\mid c)=p(z\mid x,c)\,p(x\mid c) $$

可得恒等式

$$ \log p(x\mid c)=\log\frac{p(x,z\mid c)}{p(z\mid x,c)}. $$

两边对 $q_\phi(z\mid x,c)$ 积分，左边为

$$ \int q_\phi(z\mid x,c)\,\log p(x\mid c)\,\mathrm{d}z=\log p(x\mid c), $$

右边为

$$ \int q_\phi(z\mid x,c)\,\log\frac{p(x,z\mid c)}{p(z\mid x,c)}\,\mathrm{d}z. $$

在右侧加并减去同一项 $\int q_\phi(z\mid x,c)\,\log q_\phi(z\mid x,c)\,\mathrm{d}z$，得到

$$ \begin{aligned} \log p(x\mid c) &=\int q_\phi(z\mid x,c)\,\log\frac{p(x,z\mid c)}{q_\phi(z\mid x,c)}\,\mathrm{d}z +\int q_\phi(z\mid x,c)\,\log\frac{q_\phi(z\mid x,c)}{p(z\mid x,c)}\,\mathrm{d}z \\ &=\underbrace{\int q_\phi(z\mid x,c)\,\log\frac{p(x,z\mid c)}{q_\phi(z\mid x,c)}\,\mathrm{d}z}_{\text{ELBO}(\theta,\phi;x,c)} +\underbrace{KL\!\left(q_\phi(z\mid x,c)\,\|\,p(z\mid x,c)\right)}_{\ge 0}. \end{aligned} $$

由 $KL\ge 0$，得变分下界

$$ \log p(x\mid c)\;\ge\;\mathcal{L}_{\text{ELBO}}(\theta,\phi;x,c) =\int q_\phi(z\mid x,c)\,\log\frac{p(x,z\mid c)}{q_\phi(z\mid x,c)}\,\mathrm{d}z. $$

将联合分布分解为

$$ p(x,z\mid c)=p_\theta(x\mid z,c)\,p_\theta(z\mid c), $$

可将下界写成更常见的“重构项 − KL 项”形式：

$$ \begin{aligned} \mathcal{L}_{\text{ELBO}}(\theta,\phi;x,c) &=\int q_\phi(z\mid x,c)\,\Big[\log p_\theta(x\mid z,c)+\log p_\theta(z\mid c)-\log q_\phi(z\mid x,c)\Big]\,\mathrm{d}z \\ &=\mathbb{E}_{z\sim q_\phi(z\mid x,c)}\!\big[\log p_\theta(x\mid z,c)\big] -KL\!\left(q_\phi(z\mid x,c)\,\|\,p_\theta(z\mid c)\right). \end{aligned} $$

其中第一项为“重构项”，第二项为“正则项”（KL）。由于对给定的 $x,c$，$\log p(x\mid c)$ 与变分参数 $\phi$ 无关，因此最大化 $\mathcal{L}_{\text{ELBO}}$ 等价于最小化 $KL\!\left(q_\phi(z\mid x,c)\,\|\,p(z\mid x,c)\right)$；在实践中通过同时优化 $\theta,\phi$ 来最大化该下界。

极大似然
#

对比 VAE 与 CVAE 的目标函数（最大化 ELBO 以近似最大化对数似然）：

CVAE：

$$ \log p_\theta(x\mid c)\;\ge\; \mathbb{E}_{q_\phi(z\mid x,c)} \!\big[\underbrace{\log p_\theta(x\mid z,c)}_{\text{重构项 ①}}\big] \;-\; \underbrace{\mathrm{KL}\!\left(q_\phi(z\mid x,c)\,\|\,p(z\mid c)\right)}_{\text{正则项 ②}} $$

VAE：

$$ \log p_\theta(x)\;\ge\; \mathbb{E}_{q_\phi(z\mid x)} \!\big[\underbrace{\log p_\theta(x\mid z)}_{\text{重构项 ①}}\big] \;-\; \underbrace{\mathrm{KL}\!\left(q_\phi(z\mid x)\,\|\,p(z)\right)}_{\text{正则项 ②}} $$

说明：

假设 $q_\phi(z\mid x,c)$ 与 $q_\phi(z\mid x)$ 为高斯分布；在 CVAE 中，推断网络 $q_\phi(z\mid x,c)$ 相比 VAE 额外接收条件 $c$ 作为输入。
生成分布 $p_\theta(x\mid z,c)$ 与 $p_\theta(x\mid z)$ 常按数据类型选取：连续数据常用高斯，二值数据常用伯努利等；在 CVAE 中，生成网络 $p_\theta(x\mid z,c)$ 同样额外接收条件 $c$。
CVAE 的先验常写作 $p(z\mid c)$（也可在实践中使用标准正态 $p(z)$ 作为条件无关的简化）。

两种理解方式
#

理解 1：设 $p(z)=\mathcal{N}(0,I)$，并取 $p(z\mid c)=p(z)=\mathcal{N}(0,I)$。

理解 2：假设 $z$ 与 $c$ 独立，则有 $p(z\mid c)=p(z)$；若再设 $p(z)=\mathcal{N}(0,I)$，则 $p(z\mid c)=\mathcal{N}(0,I)$。

结语
#

至此，我们从 AE 出发，走到能“可采样”的 VAE，再到可“按条件定向生成”的 CVAE：前者解决重构与连续潜在空间，后者让生成受控、更实用。训练中把握好重构—KL 的平衡（如 KL 退火、β-VAE），并将条件 $c$ 同时注入编码器与解码器，往往就能得到稳健的基线。

参考资料
#

点击展开查看参考资料

作者

xiadengma

常见模型深入浅出 - 系列文章

§ : 本文

§ : 一些数学基础概念

0. 写在开头#

AE（Autoencoder，自编码器）#

VAE（Variational Autoencoder，变分自编码器）#

VAE 数学推导#

细化目标函数#

最小重构代价#

重参数化技巧#

模型图#

训练阶段（learning）#

生成阶段（inference）#

CVAE（Conditional VAE，条件变分自编码器）#

CVAE 数学推导#

极大似然#

两种理解方式#

结语#

参考资料#