[This is a guest post on the recent EcoSta2017 (Econometrics and Statistics) conference in Hong Kong, contributed by Chris Drovandi from QUT, Brisbane.]
There were (at least) two sessions on Bayesian Computation at the recent EcoSta (Econometrics and Statistics) 2017 conference in Hong Kong. Below is my review of them. My overall impression of the conference is that there were lots of interesting talks, albeit a lot in financial time series, not my area. Even so I managed to pick up a few ideas/concepts that could be useful in my research. One criticism I had was that there were too many sessions in parallel, which made choosing quite difficult and some sessions very poorly attended. Another criticism of many participants I spoke to was that the location of the conference was relatively far from the city area.
In the first session (chaired by Robert Kohn), Minh-Ngoc Tran spoke about this paper on Bayesian estimation of high-dimensional Copula models with mixed discrete/continuous margins. Copula models with all continuous margins are relatively easy to deal with, but when the margins are discrete or mixed there are issues with computing the likelihood. The main idea of the paper is to re-write the intractable likelihood as an integral over a hypercube of ≤J dimensions (where J is the number of variables), which can then be estimated unbiasedly (with variance reduction by using randomised quasi-MC numbers). The paper develops advanced (correlated) pseudo-marginal and variational Bayes methods for inference.
In the following talk, Chris Carter spoke about different types of pseudo-marginal methods, particle marginal Metropolis-Hastings and particle Gibbs for state space models. Chris suggests that a combination of these methods into a single algorithm can further improve mixing.
Robert Kohn then discussed this paper about flexible modelling of multivariate distributions. The principle is the same as in Copula modelling in that the marginals are modelled separately from the dependence structure, but the resulting model is not a Copula. After the marginals are modelled, their cdf’s can be used to transform them all to roughly standard normal (eventually) then a multivariate normal mixture model is fitted to the transformed data. The paper presents some theory that the proposed model can get arbitrarily close to any general (non-parametric) copula model, whereas some other popular parametric Copula models are not. Interestingly, we had previously proposed a similar type of multivariate model in our independent SMC (see later) paper. Robert also found what we found, in that the multivariate distribution of the marginal-transformed space can be substantially less complex than that on the original space. In particular, heavy tails can be removed, and the number of modes can be reduced or at least modes can be brought much closer together. Thus fewer components are needed in the multivariate mixture model, providing a more parsimonious representation of the multivariate distribution. I was glad to hear about the useful theoretical properties of this multivariate distribution.
Gael Martin then finished the session with a non-Robert-Kohn-coauthored-paper on the asymptotic properties of approximate Bayesian computation (ABC). Gael started with the natural question of “why do we care?” about asymptotic properties given that Bayesian inference is conditioned on a dataset of finite size. Gael presented two compelling answers to this question. In the context of ABC, inference is usually performed on the basis of a summary statistic (data reduction) so it is natural to ask whether or not a particular choice of statistic leads to Bayesian consistency (if the ABC posterior distribution converges to a point mass on the true parameter value). Secondly, Gael presented conditions under which we can expect the asymptotic normality of the ABC posterior. In finite samples, the result can be used to guide the selection of the ABC tolerance such that no gain can be expected to be achieved by reducing the tolerance further.
In the second session (chaired by Minh-Ngoc Tran), David Nott kicked things off with a presentation on this paper. This research involves using variational inference to fit high-dimensional Gaussian posterior distributions. Given that the covariance matrix has a large number of parameters, it may be difficult for an optimisation algorithm (such as stochastic gradient) to efficiently find the optimal value of the variational parameters. The main novelty of the paper is to use a factor representation of the posterior covariance matrix in order to massively reduce the number of variational parameters. This can really speed up the convergence of stochastic gradient optimisation whilst still accounting for a large portion of the posterior dependence.
Marcel Scharth then spoke about Bayesian inference for a new high-dimensional factor stochastic volatility model. The model has a large number of parameters, which required Marcel to develop a novel combination of cutting-edge pseudo-marginal MCMC techniques. I think there could be a lot of lessons to be learned from this work about how best to use pseudo-marginal methods for state space models.
Next up was my talk (based on the work by my PhD student Leah South) about using independent MCMC proposals within sequential Monte Carlo (SMC) algorithms for static Bayesian models. We developed a flexible multivariate model (as explained above) which seems to be a useful independent proposal that takes advantage of the population of particles available in SMC. The advantages of using an independent proposal in SMC are: (1) use of parallel computing is obvious and (2) all of the independent proposals generated throughout the entire SMC algorithm can be “recycled” for obtaining a larger effective sample size from the posterior and reducing the variance of evidence estimates. We found that it helps a lot particularly with the latter. We compared with another recycling strategy (see this paper and that paper). We also used ideas from adaptive multiple importance sampling (see there). Not surprisingly the main drawback is the lack of guarantee that the independent proposal sufficiently covers the tails of the posterior.
Robert Kohn’s second talk of the conference finished off the session (and the conference). The talk (based on this arXiv) was about a different way of doing correlated pseudo-marginal (eg in this arXiv). The so-called block pseudo-marginal method is applicable when the random numbers required to estimate the likelihood unbiasedly are divided into blocks. Correlation between successive likelihood estimates in an MCMC algorithm is induced by proposing to update a subset of the random number blocks at each iteration. One advantage of this approach seems to be ease at which randomised quasi-MC numbers can be accommodated, which can be useful in bringing down the variance of likelihood estimators. Robert presented examples in random effect models, state space models and Bayesian big data subsampling.
Another Bayesian talk I went to was by Linda Tan. Linda spoke about her work (arXived there) which involves using a different prior for multiple Gaussian graphical models. The prior is advertised to be more flexible than previous priors used in this area. Linda also developed an SMC algorithm to infer the posterior distribution of the presence/absence of edges of each graph. Given that the parameter space is a multivariate binary distribution, I suggested that the independent MCMC proposal developed by in this paper might be more efficient than local steps of proposing to add/remove a small number of edges.