hdp/mcmc.h Documentation

mcmc.h

This file implements Markov chain Monte Carlo inference for Dirichlet processes (DP) and hierarchical Dirichlet processes (HDP). More specifically, this file implements a Gibbs sampler for the Chinese restaurant representation. See hdp.h for a description of DPs, HDPs, and the Chinese restaurant representation.

Code example

The following is an example of an HDP mixture model where the base distribution is a symmetric Dirichlet distribution with parameter 1. The likelihood distribution is a categorical over the set of ASCII characters, which we encode as an unsigned int in {1, 2, ..., 256}.

#include <hdp/mcmc.h>
using namespace core;

constexpr unsigned int depth = 2;

template<typename BaseDistribution, typename Likelihood, typename K, typename V>
void posterior_predictive_probability(
        const hdp_sampler<BaseDistribution, Likelihood, K, V>& sampler,
        const cache<BaseDistribution, Likelihood, K, V>& cache,
        const K& observation)
{
    array<weighted_feature_set<V>> paths(32);
    auto root_probabilities = cache.compute_root_probabilities(sampler, observation);
    predict(sampler, observation, paths, root_probabilities);
    cleanup_root_probabilities(root_probabilities, sampler.posterior.length);

    sort(paths, feature_set_sorter(depth));
    for (unsigned int i = 0; i < paths.length; i++) {
        unsigned int*& path = paths[i].features.features;
        print("probability of drawing '", stdout);
        print((char) observation, stdout);
        if (path[0] == IMPLICIT_NODE) {
            print("' from any other child node: ", stdout);
        } else {
            print("' from child node ", stdout);
            print(path[0], stdout); print(": ", stdout);
        }
        print(exp(paths[i].log_probability), stdout);
        print('\n', stdout);
        free(paths[i]);
    }
}

template<typename BaseDistribution, typename Likelihood, typename K, typename V>
void do_inference(hdp<BaseDistribution, Likelihood, K, V>& h) {
    hdp_sampler<BaseDistribution, Likelihood, K, V> sampler(h);
    cache<BaseDistribution, Likelihood, K, V> cache(sampler);

    prepare_sampler(sampler, cache);
    for (unsigned int i = 0; i < 200; i++)
        sample_hdp<true>(sampler, cache);
    for (unsigned int i = 0; i < 800; i++) {
        sample_hdp<true>(sampler, cache);
        if (i % 5 == 0) sampler.add_sample();
    }

    posterior_predictive_probability(sampler, cache, (unsigned int) '+');
    posterior_predictive_probability(sampler, cache, (unsigned int) '-');
    posterior_predictive_probability(sampler, cache, (unsigned int) 'a');
}

int main() {
    double alpha[] = {1.0e6, 1.0e-2};
    dirichlet<double> base_distribution(256, 1.0);
    hdp<dirichlet<double>, dense_categorical<double>, unsigned int, double> h(base_distribution, alpha, depth);

    unsigned int first_child[] = { 1 };
    unsigned int second_child[] = { 2 };
    for (unsigned int i = 0; i < 100; i++) {
        add(h, first_child, depth, (unsigned int) '+');
        add(h, second_child, depth, (unsigned int) '-');
    }

    do_inference(h);
}

In this example, we use the following model:

\[ \begin{align*} H &= \text{Dirichlet}([1, \ldots, 1]), \\ G^{\textbf{0}} &\sim DP(H, 10^6), \\ G^{\textbf{1}}, G^{\textbf{2}}, G^{\textbf{3}} &\sim DP(G_{\textbf{0}}, 10^{-2}), \\ x^{\textbf{1}}_1, \ldots, x^{\textbf{1}}_{100} &\sim G_{\textbf{1}}, \\ x^{\textbf{2}}_1, \ldots, x^{\textbf{2}}_{100} &\sim G_{\textbf{2}}, \\ y^{\textbf{1}}_i &\sim \text{Categorical}(x^{\textbf{1}}_i) \text{ for } i = 1, \ldots, 100, \\ y^{\textbf{2}}_i &\sim \text{Categorical}(x^{\textbf{2}}_i) \text{ for } i = 1, \ldots, 100. \end{align*} \]

The only variables we observe are: $ y^{\textbf{1}}_i = \texttt{`+'} $ and $ y^{\textbf{2}}_i = \texttt{`-'} $ for $ i = 1, \ldots, 100 $. We want to compute the probabilities of:

\[ \begin{equation*} p(y^{\textbf{1}}_{new} | \boldsymbol{y}), p(y^{\textbf{2}}_{new} | \boldsymbol{y}), p(y^{\textbf{3}}_{new} | \boldsymbol{y}), \end{equation*} \]

where $ \boldsymbol{y} \triangleq \{y^{\textbf{1}}_1, \ldots, y^{\textbf{1}}_{100}, y^{\textbf{2}}_1, \ldots, y^{\textbf{2}}_{100}\} $ is the set of all observations, and each $ y^{\textbf{n}}_{new} $ is a new (previously unobserved) sample drawn from $ x^{\textbf{n}}_{new} $ which is in turn drawn from $ G^{\textbf{n}} $ (i.e. the posterior predictive distribution). The above code does exactly this, and the expected output is:

probability of drawing '+' from child node 1: 0.283680
probability of drawing '+' from child node 2: 0.002809
probability of drawing '+' from any other child node: 0.003907
probability of drawing '-' from child node 1: 0.002809
probability of drawing '-' from child node 2: 0.283680
probability of drawing '-' from any other child node: 0.003907
probability of drawing 'a' from child node 1: 0.002809
probability of drawing 'a' from child node 2: 0.002809
probability of drawing 'a' from any other child node: 0.003906

The function do_inference constructs the hdp_sampler and cache structures necessary to perform MCMC sampling. hdp_sampler (and node_sampler) stores the variables used by the sampling algorithm. cache is a structure that optimizes the sampling algorithm for particular choices of the base distribution and likelihood. Once constructed, the function then executes 200 "burn-in" iterations, to allow the MCMC to mix (converge to the true posterior). Then, it performs 800 more iterations, keeping every fifth sample (to minimize autocorrelation among samples, since we want them to be as independent as possible). Finally, the function posterior_predictive_probability is called to compute the above probabilities.

A unit test is also available in mcmc.cpp, as another example.

Classes, functions, and variables in this file
struct	node_sampler
struct	hdp_sampler
bool	init (hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, hdp< BaseDistribution, DataDistribution, K, V > & root, unsigned int initial_table_count = 1)
bool	copy (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & src, hdp_sampler< BaseDistribution, DataDistribution, K, V > & dst, hdp< BaseDistribution, DataDistribution, K, V > & new_root, const hash_map< const node< K, V > , node< K, V > > & node_map)
bool	print (const NodeType & node, Stream & out, KeyPrinter & key_printer, AtomPrinter & atom_printer, unsigned int level = 0)
bool	read (hdp_sampler< BaseDistribution, DataDistribution, K, V > & n, Stream & stream, hdp< BaseDistribution, DataDistribution, K, V > & h, KeyReader & key_reader)
bool	write (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & n, Stream & stream, KeyWriter & key_writer)
void	prepare_sampler (hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, cache< BaseDistribution, DataDistribution, K, V > & cache)
bool	add (hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, const unsigned int * path, unsigned int depth, const K & observation, Cache & cache)
bool	remove (hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, const unsigned int * path, unsigned int depth, const K & observation, Cache & cache)
void	sample_hdp (hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, cache< BaseDistribution, DataDistribution, K, V > & cache)
void	sample_alpha_each_node (hdp_sampler< BaseDistribution, DataDistribution, K, V > & n, const V * a, const V * b)
bool	sample_alpha_each_level (hdp_sampler< BaseDistribution, DataDistribution, K, V > & n, const V * a, const V * b)
V	log_probability_each_level (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & n, const V * a, const V * b)
V **	copy_root_probabilities (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, const V const src)
V **	copy_root_probabilities (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, const V const src, unsigned int observation_count)
array< unsigned int > *	copy_root_probabilities (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, const array< unsigned int > * src, unsigned int observation_count)
void	predict (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, const K & observation, const unsigned int * path, const unsigned int const excluded, const unsigned int * excluded_counts, array< FeatureSet > & x, const RootDistributionType & root_probabilities)
void	predict (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, const K & observation, array< FeatureSet > & x, const RootDistributionType & root_probabilities)
void	predict (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & h, const unsigned int * path, const unsigned int const excluded, const unsigned int * excluded_counts, hash_map< FeatureSet, V * > & x, const K * observations, unsigned int observation_count, const RootDistributionType & root_probabilities)
V	log_probability (const hdp_sampler< BaseDistribution, DataDistribution, K, V > & h)

struct node_sampler
[view source]

template<typename K, typename V>

This structure stores Gibbs sampling variables for a single HDP non-root node (i.e. a node object). These "sampler" structures form a tree parallel to the HDP hierarchy consisting of hdp and node objects.

Since the Gibbs sampler is derived using the Chinese restaurant representation, every node_sampler contains an infinite number of tables, but we only store the non-empty tables. The number of non-empty tables is given by node_sampler::table_count. However, during inference, the number of occupied tables may increase and decrease. As such, the node keeps a larger "capacity" of tables node_sampler::table_capacity to avoid reallocating memory at every sampling step. As long as table_count <= table_capacity, the algorithm doesn't need to reallocate memory. Every observation drawn from this node is assigned to a table (just as a customer picks a table at which to sit). All tables in this node are, in turn, sampled from the distribution of the parent node. As such, the tables in this node are assigned a table in the parent node.

For an example of how to perform MCMC inference with DPs and HDPs, see the code example above.

K	the generic type of the observations drawn from this distribution.
V	the type of the probabilities.

Public members
node_sampler< K, V > *	children
node_sampler< K, V > *	parent
node< K, V > *	n
unsigned int *	observation_assignments
unsigned int *	table_sizes
unsigned int *	table_assignments
unsigned int *	root_assignments
array_multiset< K > *	descendant_observations
unsigned int	table_count
unsigned int	table_capacity
unsigned int	customer_count
typedef	K atom_type
typedef	V value_type
typedef	node_sampler< K, V > child_type
typedef	node< K, V > node_type

BaseDistribution	the type of the base distribution (see hdp).
DataDistribution	the type of the likelihood (see hdp).
K	the generic type of the observations drawn from this distribution.
V	the type of the probabilities.

Public members
node_sampler< K, V > *	children
node_type *	n
unsigned int *	observation_assignments
unsigned int *	table_sizes
array_multiset< K > *	descendant_observations
unsigned int	table_count
unsigned int	table_capacity
unsigned int	customer_count
	hdp_sampler (hdp< BaseDistribution, DataDistribution, K, V > & root)
typedef	K atom_type
typedef	V value_type
typedef	BaseDistribution base_distribution_type
typedef	DataDistribution data_distribution_type
typedef	node_sampler< K, V > child_type
typedef	hdp< BaseDistribution, DataDistribution, K, V > node_type

hdp_sampler< BaseDistribution, DataDistribution, K, V > &	h,
hdp< BaseDistribution, DataDistribution, K, V > &	root,
unsigned int	initial_table_count = 1	)

const hdp_sampler< BaseDistribution, DataDistribution, K, V > &	src,
hdp_sampler< BaseDistribution, DataDistribution, K, V > &	dst,
hdp< BaseDistribution, DataDistribution, K, V > &	new_root,
const hash_map< const node< K, V > , node< K, V > > &	node_map	)

const NodeType &	node,
Stream &	out,
KeyPrinter &	key_printer,
AtomPrinter &	atom_printer,
unsigned int	level = 0	)

NodeType	either node_sampler or hdp_sampler.
KeyPrinter	a scribe type for which the function `bool print(unsigned int, Stream&, KeyPrinter&)` or `bool print(unsigned int, Stream&, KeyPrinter&, unsigned int)` is defined (if the latter is defined, it will be called, and the level of the printed node is passed as the fourth argument). This scribe is used to print the `unsigned int` keys of child nodes in the hierarchy.
AtomPrinter	a scribe type for which the function `bool print(const K&, Stream&, AtomPrinter&)` is defined. This scribe is used to print the observations in the hierarchy.

hdp_sampler< BaseDistribution, DataDistribution, K, V > &	n,
Stream &	stream,
hdp< BaseDistribution, DataDistribution, K, V > &	h,
KeyReader &	key_reader	)

hdp_sampler< BaseDistribution, DataDistribution, K, V > &	h,
cache< BaseDistribution, DataDistribution, K, V > &	cache	)

hdp_sampler< BaseDistribution, DataDistribution, K, V > &	h,
const unsigned int *	path,
unsigned int	depth,
const K &	observation,
Cache &	cache	)

hdp_sampler< BaseDistribution, DataDistribution, K, V > &	n,
const V *	a,
const V *	b	)

a	an array of shape parameters for the Gamma prior on hdp::alpha, with length hdp::depth, one element for every level in the hierarchy.
b	an array of rate parameters for the Gamma prior on hdp::alpha, with length hdp::depth, one element for every level in the hierarchy.

a	an array of shape parameters for the Gamma prior on hdp::alpha, with length hdp::depth, one element for every level in the hierarchy.
b	an array of rate parameters for the Gamma prior on hdp::alpha, with length hdp::depth, one element for every level in the hierarchy.

const hdp_sampler< BaseDistribution, DataDistribution, K, V > &	h,
const V const	src	)

const hdp_sampler< BaseDistribution, DataDistribution, K, V > &	h,
const array< unsigned int > *	src,
unsigned int	observation_count	)

struct node_sampler[view source]

struct hdp_sampler[view source]

struct node_sampler
[view source]

struct hdp_sampler
[view source]