A primer on nested sampling

In this chapter we will explore the basics of the nested sampling (NS) algorithm. As the name suggests NS is an algorithm that can sample parameter spaces in a particularly elegant way. We will first establish the basic working principle of NS and continuously develop some understanding of how NS can be used.

A first glimpse into the NS algorithm

Conceptually NS follows a very simple algorithm. For an objective function it essentially consists of the following steps:

  1. Initialization: Randomly generate so called walker configurations from the parameter space
  2. Repeat the following steps until convergence:
    • determine the highest walker , which defines an iso-threshold
    • remove as a sample
    • sample a new walker configuration uniformly from the region where

This may sound complicated initially, so let's look at a simple 2-dimensional example. In this case the parameter space is . From now on I will refer to the objective function as energy. We will work with Hosaki's function here, which has two distinct minima, a local minimum at and the global minimum located at .

In the interactive visualization below, you can navigate through a NS simulation employing a set of walkers, which are drawn as red dots here. The energy limit at each iteration is indicated by a white line and the region where is blurred. The highest energy walker, that defines , is highlighted and filled in violet. Removed walkers remain in the blurred region as empty grey circles, these are our NS samples. You can either hit Play to run the whole animation, or you use the "-" and "+" buttons as well as the slider to manually hop between iterations.

So what do we observe here? We start from a set of 5 walkers, which are spread almost over the whole region. By definition of the algorithm above, the energy limit of each iteration is lower than the previous one, it has to be monotonically decreasing. Consequently, after every performed iteration, the accessible region in the parameter space becomes smallery. The set of walkers is therefore continuously zooming into the lower energy parts of the parameter space until all walkers eventually collapse into a single point — the global minimum.

NS as an integration algorithm

So far we recognized NS as a kind of optimization algorithm. If we start with a collection of walkers anywhere on the surface, NS will ideally always end up in the global minimum. However, NS is capable of much more than that. As the name nested sampling suggests, a NS simulation generates a set of very useful samples. Before we dive into the details, let's do another interactive experiment. We use the exact same setup as before, but now for each iteration, I highlight the difference between the energy threshold from the current iteration and the iteration before.

You see, that each sample has such a corresponding volume (or area in this 2D case) assigned. If we knew these volumes , we could get a numerical approximation for the integral over :

Note, that this is very similar to spanning a regular 2D grid through the parameter space. In this case, the volumes would be , the volume of the individual grid cells. We can therefore also recognize the NS algorithm as a numerical integration algorithm. The only difference to regular grid based integration is, that we perform the partitioning of the space into "tiles" differently. We do not partition it into a regular grid of squares, instead we partition it into a sequence of nested shells with individual volumes. The mathematicians among the readers might notice the parallels between grid-based integration and Riemannian integration as well as NS integration and Lebesgue integration.

Unfortunately, in NS there is no way to analytically get the values for these volumes . Practically, this would require integration over very complex parameter space domains. If we were able to handle those, we wouldn't need the NS algorithm for the integration at all. To our rescue, the NS algorithm can give us a statistical estimate of these volumes . The exact math behind this is slightly more involved and exceeds the scope of this article. Summarizing, the NS algorithm can be regarded as a global optimization algorithm that collects valuable samples along the way, that can later on be used to compute a numerical approximation for an integral over the objective function.

The computational bottleneck of NS

In the previous sections we got an idea of the capabilities of the NS algorithm. However, I strategically ignored one central thing so far, which turns out to be the computational bottleneck of the whole algorithm. Namely, how can we sample a new walker configuration uniformly from the region where ? In the literature this problem is referred to as sampling from the bounded uniform distribution or the likelihood constrained prior. We will review the origin of the latter name later on.

Stated in a mathematical fashion, we need to generate a sample from a distribution, which has the probability density

where is the entire volume enclosed by the contour. There are essentially two ways to perform this sampling: region samplers and Markov-chain Monte Carlo (MCMC) samplers. For simplicity, we will focus only on MCMC samplers here. While the name sounds intimidating at first, MCMC samplers are also conceptually simple to understand. The basic idea is to clone a walker configuration and perform a random walk on it until it is sufficiently decorrelated from its original walker. Let's see how this looks for our simple 2D example.

This example clearly demonstrates how these random walks can be used to navigate the volumes enclosed by the energy limit . So how does this work under the hood? The underlying algorithm behind these random walks is called the Metropolis-Hastings algorithm. We initialize a random walk by

  1. Clone a random one of your walker configurations

and then subsequently perform a series of Metropolis-Hastings MC steps, where each consists of these steps

  • Sample a displacement from a proposal distribution (e.g. a 2D gaussian here)
  • Apply the displacement to the clone
  • Evaluate the Metropolis-Hastings acceptance criterion
  • if the step is accepted you keep else you keep

Given that a sufficient amount of MC steps are performed, such a random walk can ideally deliver us an independent sample from any target distribution. In our case, the target distribution is the bounded uniform distribution and one can show without much effort that the Metropolis-Hastings acceptance criterion boils down to

In other words, we can now put the whole random walk algorithm together like this:

  1. Clone a random one of your walker configurations as
  2. Perform a random walk of MC steps
    • for i in (1, ..., L)
      • Sample a displacement from a proposal distribution (e.g. a 2D gaussian here)
      • Apply the displacement to the clone
      • If keep else

In MCMC based NS implementations, this is where 99.9% of the total computation time is spent, which mainly arises from the large amount of evaluations of the objective function . With that, you would now basically have all the necessary knowledge at hand to code your own simple NS implementation! We can also see here, that NS itself is a gradient-free optimization algorithm, as opposed to numerous other global optimization algorithms out there.

A simple nested sampling simulator

In this basic introduction we already got our hands on the two main hyperparameters of NS, namely the number of walkers as well as the number of random walk steps . Below you find a real NS simulator, where you can explore different energy landscapes and see how these parameters affect the sampling. Note, that there is now also an input field where you can enter a numerical random seed. Since NS is a stochastic algorithm, the random seed can have a drastic effect on the outcomes of a simulation.