From log evidence to the ELBO
Starting from \(\log p_\theta(x)\), insert the variational posterior and rearrange:
where
Because the KL term is nonnegative, \(\mathcal{L}\) is a lower bound on the log evidence, hence the name evidence lower bound.
Why amortization matters
Classical variational inference solved a separate optimization problem for each datapoint. Amortized inference replaces that with a shared inference network \(q_\phi(z \mid x)\). This is computationally efficient, but it introduces an approximation gap because one encoder must represent many local posteriors.
Minimal implementation sketch
mu, logvar = encoder(x)
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
z = mu + eps * std
recon = decoder(z)
recon_loss = F.binary_cross_entropy(recon, x, reduction="sum")
kl = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
loss = recon_loss + kl
The scientific value of this framework is broader than VAEs. ELBO-style reasoning appears across latent-variable modelling, probabilistic sequence models, and Bayesian deep learning.