# Variational lower-bound for hidden Markov models

if you define the forward messages as the filtering distribution (like Beal), then the normalization constant from the update is $$p(y_t | y_{1:t-1})$$.

If you define the forward messages as the joint $$p(y_{1:t}, x_t)$$ like Emily does (and like Matt and we do), then the normalization constant is $$p(y_{1:t})$$ so that you need to divide to get the conditionals. Note that in our case our messages are “unnormalized” at each time, so they’re the joint. Write a blog post on this.