# Variational lower-bound for hidden Markov models

if you define the forward messages as the filtering distribution (like Beal), then the normalization constant from the update is $$p(y_t | y_{1:t-1})$$.

If you define the forward messages as the joint $$p(y_{1:t}, x_t)$$ like Emily does (and like Matt and we do), then the ...

# Convergent Series and lim inf

This little result came up when proving the convergence of a stochastic gradient algorithm and I want to write it down to remember it after discussions with Matt Johnson and Alex Tank.

Let $$a_1, a_2, \ldots$$ be a positive sequence of numbers. If \(\sum_{n=1}^\infty \frac{1}{n ...