(A Creative Blog Name Here)

Code, math, and other things I find useful

Variational lower-bound for hidden Markov models

if you define the forward messages as the filtering distribution (like Beal), then the normalization constant from the update is \(p(y_t | y_{1:t-1})\).

If you define the forward messages as the joint \(p(y_{1:t}, x_t)\) like Emily does (and like Matt and we do), then the normalization constant is \(p(y_{1:t})\) so that you need to divide to get the conditionals. Note that in our case our messages are “unnormalized” at each time, so they’re the joint. Write a blog post on this.