This file implements Markov chain Monte Carlo inference for Dirichlet processes (DP) and hierarchical Dirichlet processes (HDP). More specifically, this file implements a Gibbs sampler for the Chinese restaurant representation. See hdp.h for a description of DPs, HDPs, and the Chinese restaurant representation.
The following is an example of an HDP mixture model where the base distribution is a symmetric Dirichlet distribution with parameter 1. The likelihood distribution is a categorical over the set of ASCII characters, which we encode as an unsigned int in {1, 2, ..., 256}
.
#include <hdp/mcmc.h> using namespace core; constexpr unsigned int depth = 2; template<typename BaseDistribution, typename Likelihood, typename K, typename V> void posterior_predictive_probability( const hdp_sampler<BaseDistribution, Likelihood, K, V>& sampler, const cache<BaseDistribution, Likelihood, K, V>& cache, const K& observation) { array<weighted_feature_set<V>> paths(32); auto root_probabilities = cache.compute_root_probabilities(sampler, observation); predict(sampler, observation, paths, root_probabilities); cleanup_root_probabilities(root_probabilities, sampler.posterior.length); sort(paths, feature_set_sorter(depth)); for (unsigned int i = 0; i < paths.length; i++) { unsigned int*& path = paths[i].features.features; print("probability of drawing '", stdout); print((char) observation, stdout); if (path[0] == IMPLICIT_NODE) { print("' from any other child node: ", stdout); } else { print("' from child node ", stdout); print(path[0], stdout); print(": ", stdout); } print(exp(paths[i].log_probability), stdout); print('\n', stdout); free(paths[i]); } } template<typename BaseDistribution, typename Likelihood, typename K, typename V> void do_inference(hdp<BaseDistribution, Likelihood, K, V>& h) { hdp_sampler<BaseDistribution, Likelihood, K, V> sampler(h); cache<BaseDistribution, Likelihood, K, V> cache(sampler); prepare_sampler(sampler, cache); for (unsigned int i = 0; i < 200; i++) sample_hdp<true>(sampler, cache); for (unsigned int i = 0; i < 800; i++) { sample_hdp<true>(sampler, cache); if (i % 5 == 0) sampler.add_sample(); } posterior_predictive_probability(sampler, cache, (unsigned int) '+'); posterior_predictive_probability(sampler, cache, (unsigned int) '-'); posterior_predictive_probability(sampler, cache, (unsigned int) 'a'); } int main() { double alpha[] = {1.0e6, 1.0e-2}; dirichlet<double> base_distribution(256, 1.0); hdp<dirichlet<double>, dense_categorical<double>, unsigned int, double> h(base_distribution, alpha, depth); unsigned int first_child[] = { 1 }; unsigned int second_child[] = { 2 }; for (unsigned int i = 0; i < 100; i++) { add(h, first_child, depth, (unsigned int) '+'); add(h, second_child, depth, (unsigned int) '-'); } do_inference(h); }In this example, we use the following model:
probability of drawing '+' from child node 1: 0.283680 probability of drawing '+' from child node 2: 0.002809 probability of drawing '+' from any other child node: 0.003907 probability of drawing '-' from child node 1: 0.002809 probability of drawing '-' from child node 2: 0.283680 probability of drawing '-' from any other child node: 0.003907 probability of drawing 'a' from child node 1: 0.002809 probability of drawing 'a' from child node 2: 0.002809 probability of drawing 'a' from any other child node: 0.003906
The function do_inference
constructs the hdp_sampler and cache structures necessary to perform MCMC sampling. hdp_sampler (and node_sampler) stores the variables used by the sampling algorithm. cache is a structure that optimizes the sampling algorithm for particular choices of the base distribution and likelihood. Once constructed, the function then executes 200 "burn-in" iterations, to allow the MCMC to mix (converge to the true posterior). Then, it performs 800 more iterations, keeping every fifth sample (to minimize autocorrelation among samples, since we want them to be as independent as possible). Finally, the function posterior_predictive_probability
is called to compute the above probabilities.
A unit test is also available in mcmc.cpp, as another example.
This structure stores Gibbs sampling variables for a single HDP non-root node (i.e. a node object). These "sampler" structures form a tree parallel to the HDP hierarchy consisting of hdp and node objects.
Since the Gibbs sampler is derived using the Chinese restaurant representation, every node_sampler contains an infinite number of tables, but we only store the non-empty tables. The number of non-empty tables is given by node_sampler::table_count. However, during inference, the number of occupied tables may increase and decrease. As such, the node keeps a larger "capacity" of tables node_sampler::table_capacity to avoid reallocating memory at every sampling step. As long as table_count <= table_capacity
, the algorithm doesn't need to reallocate memory. Every observation drawn from this node is assigned to a table (just as a customer picks a table at which to sit). All tables in this node are, in turn, sampled from the distribution of the parent node. As such, the tables in this node are assigned a table in the parent node.
For an example of how to perform MCMC inference with DPs and HDPs, see the code example above.
K | the generic type of the observations drawn from this distribution. |
V | the type of the probabilities. |
Public members | |
---|---|
node_sampler< K, V > * | children |
node_sampler< K, V > * | parent |
node< K, V > * | n |
unsigned int * | observation_assignments |
unsigned int * | table_sizes |
unsigned int * | table_assignments |
unsigned int * | root_assignments |
array_multiset< K > * | descendant_observations |
unsigned int | table_count |
unsigned int | table_capacity |
unsigned int | customer_count |
typedef | K atom_type |
typedef | V value_type |
typedef | node_sampler< K, V > child_type |
typedef | node< K, V > node_type |
A native array containing the child nodes of this node in the sampler hierarchy. This array is parallel to the node::children and hdp::children array maps.
A pointer to the parent node in the sampler hierarchy.
An array parallel to node::observations that indicates which table in this node each observation is assigned to.
An array with length node_sampler::table_count (and capacity node_sampler::table_capacity) that keeps track of the number of customers sitting at each table.
Each table in this node is assigned to a table in the parent node. This array with length node_sampler::table_count (and capacity node_sampler::table_capacity) keeps track the assignments of tables in this node to tables in the parent node.
Since every table in every node in the sampler hierarchy is assigned to a table in the parent node, we can follow the chain of table assignments to tables in the root node. This array with length node_sampler::table_count (and capacity node_sampler::table_capacity) keeps track of the assignemnts of tables in this node to tables in the root node.
This array of array_multiset objects keeps track, for each table in this node, the set of observations assigned to this table, either directly or indirectly via descendant nodes in the hierarchy. This array has length node_sampler::table_count and capacity node_sampler::table_capacity.
The number of occupied tables in the Chinese restaurant represented by this sampler node.
The current capacity of tables in this sampler node.
The total number of customers sitting at all tables at this sampler node.
The generic type of the observations drawn from this distribution.
The type of the probabilities.
The type of a child node of this node in the sampler hierarchy.
The "sampler" structures form a tree parallel to the HDP hierarchy which consists of hdp and node objects. This typedef returns the type of the object that this node_sampler represents in the HDP hierarchy.
This structure stores Gibbs sampling variables for an HDP root node (i.e. an hdp object). These "sampler" structures form a tree parallel to the HDP hierarchy consisting of hdp and node objects.
Since the Gibbs sampler is derived using the Chinese restaurant representation, every hdp_sampler contains an infinite number of tables, but we only store the non-empty tables. The number of non-empty tables is given by hdp_sampler::table_count. However, during inference, the number of occupied tables may increase and decrease. As such, the node keeps a larger "capacity" of tables hdp_sampler::table_capacity to avoid reallocating memory at every sampling step. As long as table_count <= table_capacity
, the algorithm doesn't need to reallocate memory. Every observation drawn from this node is assigned to a table (just as a customer picks a table at which to sit).
For an example of how to perform MCMC inference with DPs and HDPs, see the code example above.
BaseDistribution | the type of the base distribution (see hdp). |
DataDistribution | the type of the likelihood (see hdp). |
K | the generic type of the observations drawn from this distribution. |
V | the type of the probabilities. |
Public members | |
---|---|
node_sampler< K, V > * | children |
node_type * | n |
unsigned int * | observation_assignments |
unsigned int * | table_sizes |
array_multiset< K > * | descendant_observations |
unsigned int | table_count |
unsigned int | table_capacity |
unsigned int | customer_count |
hdp_sampler (hdp< BaseDistribution, DataDistribution, K, V > & root) | |
typedef | K atom_type |
typedef | V value_type |
typedef | BaseDistribution base_distribution_type |
typedef | DataDistribution data_distribution_type |
typedef | node_sampler< K, V > child_type |
typedef | hdp< BaseDistribution, DataDistribution, K, V > node_type |
A native array containing the child nodes of this node in the sampler hierarchy. This array is parallel to the node::children and hdp::children array maps.
An array parallel to hdp::observations that indicates which table in this node each observation is assigned to.
An array with length node_sampler::table_count (and capacity node_sampler::table_capacity) that keeps track of the number of customers sitting at each table.
This array of array_multiset objects keeps track, for each table in this node, the set of observations assigned to this table, either directly or indirectly via descendant nodes in the hierarchy. This array has length node_sampler::table_count and capacity node_sampler::table_capacity.
The number of occupied tables in the Chinese restaurant represented by this sampler root node.
The current capacity of tables in this sampler root node.
The total number of customers sitting at all tables at this sampler root node.
hdp< BaseDistribution, DataDistribution, K, V > & | root | ) |
Constructs this root sampler node with the given root
node from the HDP hierarchy. All existing observations are assigned a table in the hierarchy one-by-one, each time sampling the table assignment from the conditional distribution: given that all other table assignments are fixed, the assignment for each observation is sampled. This constructor also initializes all descendant sampler nodes, recursively, parallel to the HDP hierarchy.
The generic type of the observations drawn from this distribution.
The type of the probabilities.
The type of the base distribution (see hdp).
The type of the likelihood (see hdp).
The type of a child node of this node in the sampler hierarchy.
The "sampler" structures form a tree parallel to the HDP hierarchy which consists of hdp and node objects. This typedef returns the type of the object that this hdp_sampler represents in the HDP hierarchy.
hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
hdp< BaseDistribution, DataDistribution, K, V > & | root, | |
unsigned int | initial_table_count = 1 | ) |
Initializes the given root sampler node h
with the given root
node from the HDP hierarchy. All existing observations are assigned a table in the hierarchy one-by-one, each time sampling the table assignment from the conditional distribution: given that all other table assignments are fixed, the assignment for each observation is sampled. This constructor also initializes all descendant sampler nodes, recursively, parallel to the HDP hierarchy.
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | src, | |
hdp_sampler< BaseDistribution, DataDistribution, K, V > & | dst, | |
hdp< BaseDistribution, DataDistribution, K, V > & | new_root, | |
const hash_map< const node< K, V > *, node< K, V > * > & | node_map | ) |
Copies the HDP sampler hierarchy rooted at src
into a new sampler hierarchy rooted at dst
(which should be uninitialized). The new sampler hierarchy may be associated with a new HDP hierarchy, rooted at new_root
, in which case, node_map
should contain a map of pointers to nodes from the old hierarchy to pointers to nodes in the new hierarchy. The copy
function for hdp is one way to produce such a map.
const NodeType & | node, | |
Stream & | out, | |
KeyPrinter & | key_printer, | |
AtomPrinter & | atom_printer, | |
unsigned int | level = 0 | ) |
node
to out
.NodeType | either node_sampler or hdp_sampler. |
KeyPrinter | a scribe type for which the function |
AtomPrinter | a scribe type for which the function |
hdp_sampler< BaseDistribution, DataDistribution, K, V > & | n, | |
Stream & | stream, | |
hdp< BaseDistribution, DataDistribution, K, V > & | h, | |
KeyReader & | key_reader | ) |
n
from stream
. This function recursively reads all descendant nodes in the sampler hierarchy.KeyReader | a scribe type for which the function |
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | n, | |
Stream & | stream, | |
KeyWriter & | key_writer | ) |
n
to stream
. This function recursively writes all descendant nodes in the sampler hierarchy.KeyWriter | a scribe type for which the function |
hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
cache< BaseDistribution, DataDistribution, K, V > & | cache | ) |
Performs the necessary operations to prepare the variables in the sampler hierarchy given by h
and the cache
for calls to sample_hdp. This function also recursively prepares all descendant sampler nodes.
hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
const unsigned int * | path, | |
unsigned int | depth, | |
const K & | observation, | |
Cache & | cache | ) |
Adds the given observation
to the HDP hierarchy and sampler hierarchy represented by h
. The node to which to add the observation is given by the path
, which provides the indices to follow from each node to its child. If a node does not exist, it is created, with its concentration parameter copied from the appropriate index in hdp::alpha. Thus, depth
cannot be larger than hdp::depth
of h
. If depth == 1
, the observation is added to the root node.
This function also assigns the observation to the table according to the conditional distribution: Conditioned on the fact that all other random variables are fixed (including the other seating assignments), we sample the seating assignment for this new observation.
hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
const unsigned int * | path, | |
unsigned int | depth, | |
const K & | observation, | |
Cache & | cache | ) |
Removes the given observation
from the HDP hierarchy and sampler hierarchy represented by h
. The node from which to remove the observation is given by the path
, which provides the indices to follow from each node to its child. If a node does not exist, a warning is printed. depth
cannot be larger than hdp::depth
of h
. If depth == 1
, the observation is removed from the root node.
hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
cache< BaseDistribution, DataDistribution, K, V > & | cache | ) |
h
and all descendant sampler nodes. This function assumes prepare_sampler was called beforehand. prepare_sampler does not need to be called again before the next iteration.Collapsed | if |
hdp_sampler< BaseDistribution, DataDistribution, K, V > & | n, | |
const V * | a, | |
const V * | b | ) |
This function implements the auxiliary variable sampler for inferring the HDP concentration parameters. This function assumes that hdp::alpha is distributed according to a Gamma distribution with parameters a[level]
and b[level]
.
This function samples a distinct alpha at every node in the hierarchy. The function sample_alpha_each_level, on the other hand, samples a distinct alpha at every level in the hierarchy (it assumes that hdp::alpha is the same across all nodes in the same level).
a | an array of shape parameters for the Gamma prior on hdp::alpha, with length hdp::depth, one element for every level in the hierarchy. |
b | an array of rate parameters for the Gamma prior on hdp::alpha, with length hdp::depth, one element for every level in the hierarchy. |
hdp_sampler< BaseDistribution, DataDistribution, K, V > & | n, | |
const V * | a, | |
const V * | b | ) |
This function implements the auxiliary variable sampler for inferring the HDP concentration parameters. This function assumes that hdp::alpha is distributed according to a Gamma distribution with parameters a[level]
and b[level]
.
This function samples a distinct alpha at every level in the hierarchy (it assumes that hdp::alpha is the same across all nodes in the same level). The function sample_alpha_each_node, on the other hand, does not make this assumption and samples a distinct hdp::alpha at every node in the hierarchy.
a | an array of shape parameters for the Gamma prior on hdp::alpha, with length hdp::depth, one element for every level in the hierarchy. |
b | an array of rate parameters for the Gamma prior on hdp::alpha, with length hdp::depth, one element for every level in the hierarchy. |
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | n, | |
const V * | a, | |
const V * | b | ) |
Returns the joint probability of the concentration parameters hdp::alpha and node::alpha under the assumption that alpha is distributed according to a Gamma distribution with parameters a[level]
and b[level]
. This function also assumes that alpha is the same across all nodes in the same level.
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
const V *const * | src | ) |
Returns a copy of the root probabilities from src
, which this function assumes were computed from the sampler hierarchy rooted at h
using the function V** compute_root_probabilities(const hdp_sampler<BaseDistribution, DataDistribution, K, V>&, const K&)
in cache.
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
const V *const * | src, | |
unsigned int | observation_count | ) |
Returns a copy of the root probabilities from src
, which this function assumes were computed from the sampler hierarchy rooted at h
using the function V** compute_root_probabilities(const hdp_sampler<BaseDistribution, DataDistribution, K, V>&, const K*, unsigned int)
in cache.
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
const array< unsigned int > * | src, | |
unsigned int | observation_count | ) |
Returns a copy of the root probabilities from src
, which this function assumes were computed from the sampler hierarchy rooted at h
using the function array<unsigned int>* compute_root_probabilities(const hdp_sampler<BaseDistribution, DataDistribution, K, V>&, const K*, unsigned int)
in the specialization of cache.
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
const K & | observation, | |
const unsigned int * | path, | |
const unsigned int *const * | excluded, | |
const unsigned int * | excluded_counts, | |
array< FeatureSet > & | x, | |
const RootDistributionType & | root_probabilities | ) |
Computes the posterior predictive probability of observation
being drawn from the nodes characterized by the path set. The path set is specified by the variables path
, excluded
, and excluded_counts
. Whenever an element of path
is IMPLICIT_NODE, this indicates that any child node at that level lies within the path set. Exclusions can be specified by the set of keys excluded[i]
which is a native array containing a sorted list of keys to be excluded from the path set, at that level. The length of excluded[i]
is given by excluded_counts[i]
.
This function uses the posterior samples stored in hdp_sampler::posterior and node_sampler::posterior to perform its computations.
The results are stored in the array x
of FeatureSet
objects.
FeatureSet | a "weighted feature set" type that can be initialized with the function |
RootDistributionType | the return type of the function |
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
const K & | observation, | |
array< FeatureSet > & | x, | |
const RootDistributionType & | root_probabilities | ) |
Computes the posterior predictive probability of observation
being drawn from the nodes characterized by the path set. The path set is specified by the variables path
, excluded
, and excluded_counts
. Whenever an element of path
is IMPLICIT_NODE, this indicates that any child node at that level lies within the path set.
This function uses the posterior samples stored in hdp_sampler::posterior and node_sampler::posterior to perform its computations.
The results are stored in the array x
of FeatureSet
objects.
FeatureSet | a "weighted feature set" type that can be initialized with the function |
RootDistributionType | the return type of the function |
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h, | |
const unsigned int * | path, | |
const unsigned int *const * | excluded, | |
const unsigned int * | excluded_counts, | |
hash_map< FeatureSet, V * > & | x, | |
const K * | observations, | |
unsigned int | observation_count, | |
const RootDistributionType & | root_probabilities | ) |
Computes the posterior predictive probability of each observation in observations
being drawn from the nodes characterized by the path set. The path set is specified by the variables path
, excluded
, and excluded_counts
. Whenever an element of path
is IMPLICIT_NODE, this indicates that any child node at that level lies within the path set. Exclusions can be specified by the set of keys excluded[i]
which is a native array containing a sorted list of keys to be excluded from the path set, at that level. The length of excluded[i]
is given by excluded_counts[i]
.
This function does not use the posterior samples stored in hdp_sampler::posterior and node_sampler::posterior to perform its computations. Rather, it uses the single point sample currently stored in the given sampler hierarchy (the state in hdp_sampler and node_sampler nodes).
The results are stored in the map x
from FeatureSet
objects to arrays of probabilities. Each probability array is parallel to observations
, where the probability at index i
is the posterior predictive probability of the observation observations[i]
.
FeatureSet | a "feature set" type that can be initialized with the function |
RootDistributionType | the return type of the function |
const hdp_sampler< BaseDistribution, DataDistribution, K, V > & | h | ) |
Returns the joint log probability of the seating arrangement in the sampler root node h
and all descendant nodes in the hierarchy.