Hello my blog readers,
This post is to introduce a new Python package samplepy. This package was written to simplify sampling tasks that so often creep-up in machine learning. The package implements Importance, Rejection and Metropolis-Hastings sampling algorithms.
samplepy has a very simple API. The package can be installed with pip by simply running pip install samplepy. Once installed, you can use it to sample from any univariate distribution as following (showing rejection sampling use):
from samplepy import Rejection import matplotlib.pyplot as plt import numpy as np # define a unimodal function to sample under f = lambda x: 2.0*np.exp(-2.0*x) # instantiate Rejection sampling with f and required interval rej = Rejection(f, [0.01, 3.0]) # create a sample of 10K points sample = rej.sample(10000, 1) # plot the original function and the created sample set x = np.arange(0.01, 3.0, (3.0-0.01)/10000) fx = f(x) figure, axis = plt.subplots() axis.hist(sample, normed=1, bins=40) axis2 = axis.twinx() axis2.plot(x, fx, 'g', label="f(x)=2.0*exp(-2*x)") plt.legend(loc=1) plt.show()
![Sample from f(x)=2.0*exp(-2*x) over [0.01, 3.0]](https://codefying.files.wordpress.com/2016/10/rejfig_1.png?w=300&h=224)
The three sampling method (i.e. Rejection, Importance and MH) are quite different and will achieve slightly different results for the same function. Performance is another important difference factor, with Metropolis-Hastings probably being the slowest. Let’s compare how the three sampling algorithm deliver on a bi-modal univariate function:
from samplepy import Rejection, Importance, MH import matplotlib.pyplot as plt import numpy as np f = lambda x: np.exp(-1.0*x**2)*(2.0+np.sin(5.0*x)+np.sin(2.0*x)) interval = [-3.0, 3.0] rej = Rejection(f, interval) # instantiate Rejection sampling with f and interval sample = rej.sample(10000, 1) # create a sample of 10K points x = np.arange(interval[0], interval[1], (interval[1]-interval[0])/10000) fx = f(x) figure, axis = plt.subplots() axis.hist(sample, normed=1, bins=40) axis2 = axis.twinx() axis2.plot(x, fx, 'g', label="Rejection") plt.legend(loc=1) plt.show() mh = MH(f,interval) sample = mh.sample(20000, 100, 1) # Make sure we have enough points in the sample! figure, axis = plt.subplots() axis.hist(sample, normed=1, bins=40) axis2 = axis.twinx() axis2.plot(x, fx, 'g', label="MH") plt.legend(loc=1) plt.show() imp = Importance(f, interval) sample = imp.sample(10000, 0.0001, 0.0010) # create a sample where essentially no extra importance is given to any quantile figure, axis = plt.subplots() axis.hist(sample, normed=1, bins=40) axis2 = axis.twinx() axis2.plot(x, fx, 'g', label="Importance") plt.legend(loc=1) plt.show()
Hopefully this gives you enough examples to get you started using samplepy!