Hidden Markov model (HMM) [ 1, 2 ] is a powerful model to describe sequential data and has been widely used in speech signal processing [ 3-5 ], computer vision [ 6-8 ], longitudinal data analysis [ 9 ], social networks [ 10-12 ] and so on. An HMM typically assumes the system has K internal states, and the transition of states forms a Markov chain. The system state cannot be observed directly, thus we need to infer the hidden states and system parameters based on observations. Due to the existence of latent variables, the Expectation-Maximisation (EM) algorithm [ 13, 14 ] is often used to learn an HMM. The main difficulty is to calculate site marginal distributions and pairwise marginal distributions based on the posterior distribution of latent variables. The forward-backward algorithm was specifically designed to tackle this problem. The derivation of the forward-backward algorithm heavily relies on HMM assumptions and probabilistic relationships between quantities, thus requiring the parameters in the posterior distribution to have explicit probabilistic meanings.
Bayesian HMM [ 15-22 ] further imposes priors on the parameters of HMM, and the resulting model is more robust. It has been demonstrated that Bayesian HMM often outperforms HMM in applications. However, the learning process of a Bayesian HMM is more challenging since the posterior distribution of latent variables is intractable. Mean-field theory-based variational inference is often utilised in the E-step of the EM algorithm, which tries to find an optimal approximation of the posterior distribution in a factorised family. The variational inference iteration also involves computing site marginal distributions and pairwise marginal distributions given the joint distribution of system state indicator variables. Existing works [ 15-23 ] directly apply the forward-backward algorithm to obtain these values without justification. This is not theoretically sound and the result is not guaranteed to be correct, since the requirements of the forward-backward algorithm are not met in this case.
In this paper, we prove that the forward-backward algorithm can be applied in more general cases where the parameters have no probabilistic meanings. The first proof converts the general case to an HMM and uses the correctness of the forward-backward algorithm on HMM to prove the claim. The second proof is model-free, which derives the forward-backward algorithm in a totally different way. The new derivation does not rely on HMM assumptions and merely utilises matrix techniques to rewrite the desired quantities. Therefore, this derivation naturally proves that it is unnecessary to make probabilistic requirements on the parameters of the forward-backward algorithm. Specifically, we justify that heuristically applying the forward-backward algorithm in the variational learning of Bayesian HMM is theoretically sound and guaranteed to return the correct result.