k Nearest Neighbor Classifier

On my blog space I am going to share with you example implementations of the most common machine learning techniques. The code will be either in C# or Python.

This is the first post in the series of several posts to come, which will be on algorithms commonly used to implement classifiers. A classifier is a machine learning algorithm that is allowed to build a representation of data on some training dataset, and ultimately is used to classify new unobserved instances. Classifiers are at the heart of machine learning and data mining, and they have wide use in medicine, speech and image recognition, and even finance. Broadly speaking all classifier fall into two categories: supervised and unsupervised. Supervised classifiers need to be ‘trained’, in the sense that they are fed the training data with known classifications, which is used to construct the priory. Unsupervised classifiers are a bit more complex, and work with unlabeled/unclassified training data (e.g. k-means clusters) or even have to learn the target class for each instance by trial and error (e.g. reinforced learning).

We’ll begin by looking at the most basic instance based classifier known as the K-Nearest Neighbour (kNN). Here K is the number of instances used to cast the vote when labeling previously unobserved instance. To demonstrate how kNN works, we will take this example dataset:

Table 1
Attr 1 Attr 2 Attr 3 Class
0.7 10 300 A
0.14 9 120 A
1.0 12 200 B
1.12 15 300 B
0.4 7 150 A
0.6 8 600 A
1.15 15 600 B
1.12 11 400 B

The above is a dataset with four columns and eight rows. The first three columns are the data (numerical format is required for kNN), the last column is the class. For the purpose of this example it does not matter what the data represents. I created random example with a hidden pattern, which, hopefully, our kNN algorithm can recognise.

Given previously unobserved instance I={i0,…,iN,class}, we calculate the Euclidean distance between I and each known instance in the dataset as follows:

D_i= \sqrt{\sum_{k=0}^{N}(Z^{i}_{k}-I^{i}_{k})^2}

Here Z is a sequence of values of some instance i in attribute k for which a classification is given (see the dataset), and I is the unclassified instance.
For example, given I = {Attr1=12, Attr2=11, Attr3=500}, the resulting distance matrix, after normalisation, is the following:

Table 2
D Class
11.196 A
11.772 A
10.909 B
10.792 B
11.519 A
11.295 A
10.756 B
10.774 B

The distances were calculated on normalised data. That is, instead of using the original values in Table 1, where the 3rd column dominates all other values, we normalise each value according to:

\frac{Z_{k}^{i}-Min(Z_{k})}{Max(Z_{k})-Min(Z_{k})}

Again, Z is from the dataset. The instance that we need to classify is also normalised. For example, the normalised values for Table 1 are:

Table 3
Attr 1 Attr 2 Attr 3 Class
0.554 0.375 0.375 A
0 0.25 0 A
0.851 0.625 0.167 B
0.970 1 0.375 B
0.257 0 0.062 A
0.455 0.125 1 A
1 1 1 B
0.970 0.5 0.583 B

The normalised I = {11.74, 0.5, 0.792}, which was calculated with max and min from the dataset, excluding the instance we need to classify.

Ok, now that we have calculated the distances (Table 2), we can proceed to vote on which class the instance I should belong to. To do this, we select K smallest distances and look at their corresponding classes. Let’s take K=3, then the smallest distances with votes are: 10.756 for B, 10.774 for B, and 10.792 for B. The new instance clearly belongs to class B. This is spot on, as the pattern is: Attr1<1 and Attr2<10 result in A, else B.

Consider another example I = {0.8, 11, 500}. For this instance, after normalisation, the top 3 distances with classes are: 0.379 for B, 0.446 for A, and 0.472 for A. The majority is A, so, the instance is classified as A.

Several modifications exist to the basic kNN algorithm. Some employ weights to improve classification, others use an even K and break ties in a weighted-fashion. Overall, it is a powerful, easy to implement classifier. If you are dealing with non-numeric data parameters, these can be quantified through some mapping from non-numeric value to numbers. Always remember to normalise your data, otherwise you are running a chance of introducing bias into the classifier.

Let’s now look at how kNN can be implemented with C#. At the start, I introduce several extensions to aid in data manipulation. In C#, extensions are an extremely useful concept and are quite addictive. And so is LINQ. You will notice that I try to use LINQ query where possible. Most worker method are private and my properties are read-only. Another thing to note is that my kNN constructor is private. Instead, the class users call initialiseKNN method which ensures that K is odd, since I am not using weights and don’t provide for tie breaks.

 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace kNN
{
    //extension method to aid in algorithm implementation
    public static class Extensions
    {
        //converts string representation of number to a double
        public static IEnumerable<double> ConvertToDouble<T>(this IEnumerable<T> array)
        {
            dynamic ds;
            foreach (object st in array)
            {
                ds = st;
                yield return Convert.ToDouble(ds);
            }
        }

        //returns a row in a 2D array
        public static T[] Row<T>(this T[,] array, int r)
        {
            T[] output = new T[array.GetLength(1)];
            if (r < array.GetLength(0))
            {
                for (int i = 0; i < array.GetLength(1); i++)
                    output[i] = array[r, i];
            }
            return output;
        }

        //converts a List of Lists to a 2D matrix
        public static T[,] ToMatrix<T>(this IEnumerable<List<T>> collection, int depth, int length)
        {
            T[,] output = new T[depth, length];
            int i = 0, j = 0;
            foreach (var list in collection)
            {
                foreach (var val in list)
                {
                    output[i, j] = val;
                    j++;
                }
                i++; j = 0;
            }

            return output;
        }

        //returns the classification that appears most frequently in the array of classifications
        public static string Majority<T>(this T[] array)
        {
            if (array.Length > 0)
            {
                int unique = array.Distinct().Count();
                if (unique == 1)
                    return array[0].ToString();

                return (from item in array
                             group item by item into g
                             orderby g.Count() descending
                             select g.Key).First().ToString();
            }
            else
                return "";
        }
    }

    /// <summary>
    /// kNN class implements the K Nearest Neighbor instance based classifier
    /// </summary>
    public sealed class kNN
    {
        //private constructor allows to ensure k is odd
        private  kNN(int K, string FileName, bool Normalise)
        {
            k = K;
            PopulateDataSetFromFile(FileName, Normalise);
        }

        /// <summary>
        /// Initialises the kNN class, the observations data set and the number of neighbors to use in voting when classifying
        /// </summary>
        /// <param name="K">integer representiong the number of neighbors to use in the classifying instances</param>
        /// <param name="FileName">string file name containing knows numeric observations with string classes</param>
        /// <param name="Normalise">boolean flag for normalising the data set</param>
        public static kNN initialiseKNN(int K, string FileName, bool Normalise)
        {
            if (K % 2 > 0)
                return new kNN(K, FileName, Normalise);
            else
            {
                Console.WriteLine("K must be odd.");
                return null;
            }
        }

        //read-only properties
        internal int K { get { return k; } }
        internal Dictionary<List<double>, string> DataSet { get { return dataSet;} }

        /// <summary>
        /// Classifies the instance according to a kNN algorithm
        /// calculates Eucledian distance between the instance and the know data
        /// </summary>
        /// <param name="instance">List of doubles representing the instance values</param>
        /// <returns>returns string - classification</returns>
        internal string Classify(List<double> instance)
        {
            int i=0;
            double [] normalisedInstance = new double[length];

            if (instance.Count!=length)
            {
                return "Wrong number of instance parameters.";
            }

            if (normalised)
            {
                foreach (var one in instance)
                {
                    normalisedInstance[i] = (one - originalStatsMin.ElementAt(i)) / (originalStatsMax.ElementAt(i) - originalStatsMin.ElementAt(i));
                    i++;
                }
            }
            else
            {
                normalisedInstance = instance.ToArray();
            }

            double[,] keyValue = dataSet.Keys.ToMatrix(depth, length);
            double[] distances = new double[depth];

            Dictionary<double, string> distDictionary = new Dictionary<double, string>();
            for (i = 0; i < depth; i++)
            {
                distances[i] = Math.Sqrt(keyValue.Row(i).Zip(normalisedInstance, (one, two) => (one - two) * (one - two)).ToArray().Sum());
                distDictionary.Add(distances[i], dataSet.Values.ToArray()[i]);

            }

            //select top votes
            var topK = (from d in distDictionary.Keys
                        orderby d ascending
                        select d).Take(k).ToArray();

            //obtain the corresponding classifications for the top votes
            var result = (from d in distDictionary
                        from t in topK
                        where d.Key==t
                        select d.Value).ToArray();

            return result.Majority();
        }
        /// <summary>
        /// Processess the file with the comma separated training data and populates the dictionary
        /// all values except for the class must be numeric
        /// the class is the last element in the dataset for each record
        /// </summary>
        /// <param name="fileName">string fileName - the name of the file with the training data</param>
        /// <param name="normalise">bool normalise - true if the data needs to be normalised, false otherwiese</param>
        private void PopulateDataSetFromFile(string fileName, bool normalise)
        {
            using (StreamReader sr = new StreamReader(fileName,true))
            {
                List<string> allItems = sr.ReadToEnd().TrimEnd().Split('\n').ToList();

                if (allItems.Count > 1)
                {
                    string[] array = allItems.ElementAt(0).Split(',');
                    length = array.Length - 1;
                    foreach (string item in allItems)
                    {
                        array = item.Split(',');
                        dataSet.Add(array.Where(p => p != array.Last()).ConvertToDouble().ToList(), array.Last().ToString().TrimEnd());
                    }
                    array = null;
                }
                else
                    Console.WriteLine("No items in the data set");
            }
            if (normalise)
            {
                NormaliseDataSet();
                normalised = true;
            }
        }

        private void NormaliseDataSet()
        {
            var keyCollection = from n in dataSet.Keys
                                select n;
            var valuesCollection = from n in dataSet.Values
                                   select n;

            depth = dataSet.Keys.Count;
            double[,] transpose = new double[length, depth];
            double[,] original = new double[depth, length];
            int i = 0, j = 0;

            //transpose
            foreach (var keyList in keyCollection)
            {
                foreach (var key in keyList)
                {
                    transpose[i, j] = key;
                    i++;
                }
                j++; i = 0;
            }

            //normalise
            double max, min;

            for (i = 0; i < length; i++)
            {
                originalStatsMax.Add (max = transpose.Row(i).Max());
                originalStatsMin.Add(min = transpose.Row(i).Min());

                for (j = 0; j < depth; j++)
                {
                    transpose[i, j] = (transpose[i, j] - min) / (max - min);
                }

            }
            for (i = 0; i < depth; i++)
            {
                for (j = 0; j < length; j++)
                    original[i, j] = transpose[j, i];
            }

            //overwrite the current values with the normalised ones
            dataSet = new Dictionary<List<double>, string>();
            for (i = 0; i < depth; i++)
            {
                dataSet.Add(original.Row(i).ToList(), valuesCollection.ElementAt(i));
            }
        }

        //private members
        private Dictionary<List<double>, string> dataSet = new Dictionary<List<double>,string>();
        private List<double> originalStatsMin = new List<double>();
        private List<double> originalStatsMax = new List<double>();
        private int k=0;
        private int length=0;
        private int depth=0;
        private bool normalised = false;
    }

    class EntryPoint
    {
        static void Main(string[] args)
        {
            kNN examplekNN = kNN.initialiseKNN(3,"DataSet.txt",true);

            List<double> instance2Classify = new List<double> {12,11,500};
            string result = examplekNN.Classify(instance2Classify);
            Console.WriteLine("This instance is classified as: {0}", result);
            Console.ReadLine();
        }
    }
}



In my next blog we will look at decision trees – a slightly more complicated, but also more powerful machine learning algorithm.

Dynamic Cast with Generic Collection

Greetings to my blog readers!

In this post I will share with you a trick of achieving something that, quite frankly, I do not recommend doing – dynamically casting types in generic collections.
In my previous post here I showed you that generic custom collections are great, type-safe and can be easily created by deriving from System.Collections.ObjectModel.Collection which provides a wealth of ready-made functionality. So, suppose you have a need for a custom generic collection like this, but you need to be able to treat it as a polymorphic type. For example, if you create a collection of integers, you want to be able to convert it to a collection of doubles when there is a need to do so. Or the other way around.

To begin with, you may try doing something like this, where NumericalList class is initialised with an int type, but then attempting to cast to a list of doubles to save into doubleList:

using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;


namespace TypeSafety
{
    sealed class NumericalList<T> : System.Collections.ObjectModel.Collection<T>
    {
        
    }
    
    class EntryPoint
    {
        static void Main(string[] args)
        {
            NumericalList<int> intList = new NumericalList<int> {1,2,3,4,5};

            var doubleList = from double n in intList select n;
            
            foreach (var d in doubleList)
                Console.WriteLine(d);

            Console.ReadLine();
        }

    }
}

The above code will compile, but it will not run. The reason for this is not very intuitive because one should be able to cast an int to a double implicitly. The runtime exception is System.InvalidCastException, originating from the line where doubleList is initialised and later unwound in the foreach block. So, what is going on here?

Examining the IL for the above code we can find this statement corresponding to the instructions where we attempt the implicit cast: System.Linq.Enumerable::Cast(class[mscorlib]System.Collection.IEnumerable)
The Enumerable.Cast is called by the foreach iterator, and the cast fails with the exception. The method Cast is an extension to Enumerable class. You can examine the source code for the whole class here. Below I am providing the snippets of code from the referencesource.microsoft.com that are relevant to our problem:

public static IEnumerable<TResult> Cast<TResult>(this IEnumerable source) {
            IEnumerable<TResult> typedSource = source as IEnumerable<TResult>;
            if (typedSource != null) return typedSource;
            if (source == null) throw Error.ArgumentNull("source");
            return CastIterator<TResult>(source);
        }
 
        static IEnumerable<TResult> CastIterator<TResult>(IEnumerable source) {
            foreach (object obj in source) yield return (TResult)obj;
        }

Since as never throws an exception, the exception is thrown from the CastIterator extension where the cast takes place. The explicit cast is from object to, in our case, double, which is invalid.

Ok, now that we know where and why the code fails, let’s look at how we can fix it. A simple fix is to move the cast to where it will work, i.e. to where the selected list element (an integer) can be cast to a double one at a time:

var doubleList = from n in intList select (double)n;

This does the job. But not at a class level. To be able to perform the kind of conversion we tried originally, we need to modify the default Cast extension. In particular, we need to remove the ‘casting object to a double’ part. To do this, we can utilise the dynamic keyword, which allows us to determine the type at runtime. I am not going to go into the details about this keyword here. Take a look at this comprehensive article on MSDN magazine if you are interested to learn more about dynamic. Below is the modified solution adapted for integers and doubles:

using System;
using System.Collections.Generic;
using System.Collections;
using System.Linq;

namespace TypeSafety
{
    sealed class NumericalList<T> : System.Collections.ObjectModel.Collection<T>
    {
        public IEnumerable<TRes>CustomCast<TRes>()
        {
            if (typeof(TRes).Equals(typeof(double)) || typeof(TRes).Equals(typeof(int)))
            {
                dynamic runtime;
                foreach (var tr in this)
                {
                    runtime = tr;
                    yield return (TRes)runtime;
                }
            }
        }
    }
    
    class EntryPoint
    {
        static void Main(string[] args)
        {
            NumericalList<int> intList = new NumericalList<int> {1,2,3,4,5 };
            
            var doubleList = from double n in intList.CustomCast<double>() select n;

            foreach (var d in doubleList)
                Console.WriteLine(d);

            Console.ReadLine();
        }

    }
}


The above works for doubles and integers.

Transforming Uniform Random Variables to Normal

In my previous post here I have shared with you some methods for generating uniform random variables. For many practical tasks we need random variables that are distributed normally around some mean. There are two well know approaches for transforming uniform random variables to normal or Gaussian. These are – the inverse transform and the Box-Muller methods. Let’s take a look at each of them. I am also providing my version of C# implementation.

Inverse Transform Method

The general form of this method is:

X=F^{-1} (U),U\approx Unif[0,1]

Here U is a uniform random variable from [0,1] interval, F^{-1} is the inverse of the cumulative distribution function (i.e. any given c.d.f.), and X is the desired random variable satisfying P(X \leq x)=F(x) property. Again, the c.d.f. can be for any required distribution for which the inverse is well-defined.

To make sense of this method and to see why it works, let’s take a look at the graph of a normal c.d.f. where X \approx N(0,1). Figure 1 shows the probability on the y-axis and the random variable on the x-axis. The probability axis corresponds to the uniform random variable range, which is [0,1]. Thus, we can use the y-axis to map to the x-axis, which would give us a random variable in (-3,3) interval. This random variable is normally distributed with mean 0 and standard deviation of 1. In Figure 1 I have marked the middle of both axis to make it clear that uniform random variables between 0 and 0.5 would map to normal random variables that are negative. The uniforms above 0.5 map to positive r.vs.

Figure 1
Figure 1

I hope by now it is clear how the inverse transform method works. The main reason it always works for a well-behaved distribution like the normal distribution is because the ‘mapping’ is one-to-one. That is, for each value of U, we have a unique mapping onto X. A good intuitive way to think about the inverse transform method is by analogy to taking a percentile of some distribution.

The standard normal cumulative distribution function is:

F(x)=\frac{1}{\sqrt{2\pi}} \int_{-\infty}^{x} \exp{\frac{-u^{2}}{2}} du

To use the inverse transform method we need to take the inverse, or calculate F^{-1}, for which an exact solution is not achievable computationally, and we will need to approximate it. Several numerical approximations can be found in the relevant literature. Check out, for example chapter 2.3.2 in [1] or on the Wikipedea. I will demonstrate two approximations of two different levels of accuracy.

In [1] we find a summary of the rational approximation proposed by Beasley and Springer:

F^{-1} (x) \approx \frac{\sum_{n=0}^{3} a_n (x-\frac{1}{2})^{(2n+1)}}{1+\sum_{n=0}^{3} b_n (x-\frac{1}{2})^{2n}}

Where a_n and b_n are constants (see the code for their values). The above is defined for 0.5 \le x \le0.92. For x > 0.92, we can utilize Chebyshev’s approximation, as demonstrated in [1]:

F^{-1} (x) \approx \sum_{n=0}^{8} c_n (\log(-\log(1-x)))^{n} ,0.92 \leq x<1

The values below 0.5 can be derived by symmetry. The above method gives a high accuracy with a maximum absolute error of 3*10^{-9} over the [-7,7] range for x.

A less accurate method involves approximating the error function erf, since the inverse c.d.f. can be defined in terms of erf as follows:

F^{-1} (x)=\sqrt{2} \cdot erf^{-1} (2x-1)

The inverse error function can be approximated using:

erf^{-1} (x) \approx sign(x) \sqrt{\theta}

\theta =\sqrt{{\left(\frac{2}{\pi\alpha}+\frac{ln(1-x^{2})}{2}\right)}^{2}-\frac{ln(1-x^{2})}{\alpha}}-\frac{2}{\pi\alpha}+\frac{ln(1-x^{2})}{2}

\alpha =\frac{8(\pi-3)}{3\pi(4-\pi)}

This approach achieves a maximum absolute error of 0.00035 for all x. To me, the erf method is simpler, both, to understand and to implement.

Box-Muller Method

Box-Muller method for transforming uniform random variables to normal r.vs. is an algorithms that involves choosing a random point uniformly from the circle of radius \sqrt{R}, such that R=Z_1^{2}+Z_2^{2} follows exponential distribution with mean 2 and Z_n\sim N_2 (0,I_2 ) (i.e. bivariate normal).

Box-Muller method requires two uniform random variables, and it produces two standard normal random variables. The reason for having two, is because we are simulating a point on a circle in x,y plane. So, to pick a random point on a circle we need a radius and the angle it makes with the origin. If we can randomise the radius and the angle with the origin (using the two random uniforms), we can achieve a random way of simulating the (x,y) point on a circle. Take a look at Figure 2 below, which demonstrates this concept.

Figure 2
Figure 2

Ok, hopefully the above gives you a clear explanation for the foundations of the inverse transform and the Box-Muller methods. Let’s now look at how we can implement them in C#.

Implementation

Box-Muller class has an overloaded method that works for a single pair or a 2D array of r.v.s:

internal sealed class BoxMuller
    {
        /// <summary>
        /// Returns an arrays with Z1 and Z2 - two standard normal random variables generated using Box-Muller method
        /// </summary>
        /// <param name="U1">first standard random uniform</param>
        /// <param name="U2">second standard randoom uniform</param>
        /// <returns>double[] of length=2</returns>
        internal double[] BoxMullerTransform(double U1, double U2)
        {
            double[] normals = new double[2];
            if (U1 < 0 || U1 > 1 || U2 < 0 || U2 > 1)
            {
                normals[0] = default(double);
                normals[1] = default(double);
            }
            else
            {
                double sqrtR = Math.Sqrt(-2.0*Math.Log(U1));
                double v = 2.0*Math.PI*U2;

                normals[0] = sqrtR * Math.Cos(v);
                normals[1] = sqrtR * Math.Sin(v);
            }
            return normals;
        }

        /// <summary>
        /// Returns a 2D array with Z1[] and Z2[] - two standard normal random variables generated using Box-Muller method
        /// </summary>
        /// <param name="U1">u1[] an array of first standard random uniforms</param>
        /// <param name="U2">u2[] an array of second standard random uniforms</param>
        /// <returns>double[L,2] of L=min(Length(u1),Length(u2))</returns>
        internal double[,] BoxMullerTransform(double[] U1, double[] U2)
        {
            long count=U1.Length<=U2.Length?U1.Length:U2.Length;

            double[,] normals = new double[count, 2];
            double u1, u2;

            for (int i = 0; i < count; i++)
            {
                u1 = U1[i];
                u2 = U2[i];
                if (u1 < 0 || u1 > 1 || u2 < 0 || u2 > 1)
                {
                    normals[i,0] = default(double);
                    normals[i,1] = default(double);
                }
                else
                {
                    double sqrtR = Math.Sqrt(-2.0 * Math.Log(u1));
                    double v = 2.0 * Math.PI * u2;

                    normals[i,0] = sqrtR * Math.Cos(v);
                    normals[i,1] = sqrtR * Math.Sin(v);
                }
            }
            return normals;
        }
    }

The inverse transform has four internal methods for generating normal r.v.s. As described before, the ERF method is simpler, and is also easier to implement. The Beasley-Springer-Moro (BSM) method is based on the algorithm found in [1], pg. 68. I’ve verified that it gives an accurate solution up to 9.d.p. Note that InverseTransformERF is an approximation to InverseTransformMethod, and it seems to achieve an accuracy of up to 4 d.p.

internal sealed class InverseTransform
    {
        /// <summary>
        /// Returns a standard normal random variable given a standard uniform random variable
        /// </summary>
        /// <param name="x">standard uniform rv in [0,1]</param>
        /// <returns>standard normal rv in (-3,3)</returns>
        internal double InverseTransformMethod(double x)
        {
            if (x < 0 || x > 1)
                return default(double);
            
            double res,r;
            double y = x - 0.5;
            if(y<0.42)
            {
                r = y * y;
                res = y * (((a3 * r + a2) * r + a1) * r + a0) / ((((b3 * r + b2) * r + b1) * r + b0) * r + 1.0);
            }
            else
            {
                r = (y > 0) ? (1 - x) : x;
                r = Math.Log(-1.0 * Math.Log(r));
                res = c0 + r * (c1 + r * (c2 + r * (c3 + r * (c4 + r * (c5 + r * (c6 + r * c7 + r * c8))))));
                res = (y < 0) ? -1.0 * res : res;
            }
            return res;
        }
        /// <summary>
        /// Returns an array of standard normal random variables given an array of standard uniform random variables
        /// </summary>
        /// <param name="x">an array of doubles: uniform rvs in [0,1]</param>
        /// <returns>an array of doubles: standard normal rvs in (-3,3)</returns>
        internal double[] InverseTransformMethod(double[] x)
        {
            long size = x.Length;
            double[] res = new double[size];
            double r,y;


            for (int i = 0; i < size;i++)
            {
                if (x[i]<0 || x[i]>1)
                {
                    res[i] = 0;
                    continue;
                }

                y = x[i] - 0.5;
                if (y < 0.42)
                {
                    r = y * y;
                    res[i] = y * (((a3 * r + a2) * r + a1) * r + a0) / ((((b3 * r + b2) * r + b1) * r + b0) * r + 1.0);
                }
                else
                {
                    r = (y > 0) ? (1 - x[i]) : x[i];
                    r = Math.Log(-1.0 * Math.Log(r));
                    res[i] = c0 + r * (c1 + r * (c2 + r * (c3 + r * (c4 + r * (c5 + r * (c6 + r * c7 + r * c8))))));
                    res[i] = (y < 0) ? -1.0 * res[i] : res[i];
                }
            }
            return res;
        }
        /// <summary>
        /// Performs an inverse transform using erf^-1
        /// Returns a standard normal random variable given a standard uniform random variable
        /// </summary>
        /// <param name="x">standard uniform rv in [0,1]</param>
        /// <returns>standard normal rv in (-3,3)</returns>
        internal double InverseTransformERF(double x)
        {
            return Math.Sqrt(2.0) * InverseErf(2.0 * x - 1);
        }

        /// <summary>
        /// Performs an inverse transform using erf^-1
        /// Returns an array of standard normal random variables given an array of standard uniform random variables
        /// </summary>
        /// <param name="x">an array of doubles: uniform rvs in [0,1]</param>
        /// <returns>an array of doubles: standard normal rvs in (-3,3)</returns>
        internal double[] InverseTransformERF(double[] x)
        {
            long size = x.Length;
            double[] res = new double[size];
            for (int i = 0; i < size; i++ )
            {
                res[i] = Math.Sqrt(2.0) * InverseErf(2.0 * x[i] - 1);
            }
                return res;
        }

        private double InverseErf(double x)
        {
            double res;
            double K = scalor + (Math.Log(1 - x * x)) / (2.0);
            double L=(Math.Log(1-x*x))/(alpha);
            double T = Math.Sqrt(K*K - L);

            res = Math.Sign(x) * Math.Sqrt(T-K);
            return res;
        }

        //used by the ERF method
        private const double alpha = (8.0 * (Math.PI - 3.0)) / (3.0 * Math.PI * (4.0 - Math.PI));
        private const double scalor = (2.0) / (Math.PI * alpha);

        //used by the BSM method in "Monte Carlo Methods in Financial Engineering" by P.Glasserman
        //note that a double achieves 15-17 digit precision
        //since these constancts are small, we are not using decimal type
        private const double a0 = 2.50662823884;
        private const double a1 = -18.61500062529;
        private const double a2 = 41.39119773534;
        private const double a3 = -25.44106049637;
        private const double b0 = -8.47351093090;
        private const double b1 = 23.08336743743;
        private const double b2 = -21.06224101826;
        private const double b3 = 3.13082909833;
        private const double c0 = 0.3374754822726147;
        private const double c1 = 0.9761690190917186;
        private const double c2 = 0.16079797149118209;
        private const double c3 = 0.0276438810333863;
        private const double c4 = 0.0038405729373609;
        private const double c5 = 0.0003951896511919;
        private const double c6 = 0.0000321767881768;
        private const double c7 = 0.0000002888167364;
        private const double c8 = 0.0000003960315187;

    }

Surely the two different methods map the uniforms to different normal r.v.s. So, how can we compare these two methods? One way would be to consider simplicity, speed and the accuracy vs. the exact solution.

References:
[1]P. Glasserman. “Monte Carlo Methods in Financial Engineering”, 2004. Springer-Verlag.

Solving ODEs Numerically

In this post we are going to look at two methods for approximating a solution to some ordinary differential equation. We will only consider the first order separable initial value problems. Here, the ‘first order’ means that the problem contains a first derivative only. The word ‘separable’ implies that the derivative can be separated from the derivative function (i.e. we can separate the derivative terms from the terms that do not involve derivatives). Finally, an initial value problem is a problem for which an initial condition is given, thus we have two starting values for x and y.

The following are examples of the types of problems we are considering:

\frac{dy}{dx}=xy^{\frac{1}{3}}, y(1)=1
\frac{dy}{dx}=4x-3, y(0)=0
\frac{dy}{dx}=\cos{x}-3, y(0)=0

The above examples contain expressions for the derivative of y(x). Thus, we are given the derivative, which is some function f() of x and y, and we are also given the initial value. This initial value is the value of x for which y(x) is already known. Please note that we do not know the exact expression for y(x), we just have the expression for the derivative of y w.r.t. x.

Solving a differential equation means finding the y(x). Numerical solutions to differential equations usually approximate the y(x) at some required point. So, a numerical solution does not produce an expression for y(x), it just approximates the value of y(x) at some point x. Given enough approximations, we should also be able to graph y(x) on some interval.

There are many types of approximations to differential equations. Here we will look at Euler’s and Runge-Kutta methods. Both methods are based on the Taylor series expansion. Since the function for the 1st derivative is known, this means that we have an equation for the slope to the curve y(x) makes on the (x,y) plane.

The basic definition of the first derivative is given by \frac{dy}{dx}\approx \frac{\Delta}{\Delta x}= \frac{(y(x+h)-y(x))}{h}, which says that for some small change h, we can approximate the rate of change in y to the rate of change in x using this difference ratio. This can be rearranged to give us y(x+h). Thus: y(x+h)=h[\frac{dy}{dx}]+y(x). This expression can be used to solve a differential equation for some other value of x, since we can start with our initial condition, take a small step h, and iteratively approximate y(x). The smaller is the increment step, the better will be our approximation. This is the fundamental idea behind the Euler’s method.

The Runge-Kutta method is a little bit more complex, but it is more accurate as well. This approach is building on the fact that y(x+h)=h[\frac{dy}{dx}]+y(x) is an approximation that can be improved upon. In particular, the Runge-Kutta method is the 4th order approximation to the ordinary differential equation, thus we have: y(x+h)=y(x)+h[\frac{1}{6}(k1+2\times k2+2\times k3+k4)] where
k1=f(x,y), k2=f(x+ \frac{h}{2},y+ \frac{h}{2}\times k1), k3=(x+ \frac{h}{2},y+ \frac{h}{2}\times k2) and k4=f(x+h,y+h\times k3).

It comes as no surprise that the Runge-Kutta is more accurate than the Euler’s method since it is better at approximating the curve of y(x).

We will now look at the implementation of these two methods in C#. My code is given below. Note that I am following my usual coding style by making the individual solution classes to derive from an abstract InitialValue class. Most of the variables and functionality that is common to both method is contained within the parent class. The differential function is implemented as a delegate. Finally, I am also populating the metadata for the class constructors and the solution methods this time, as this is a good practice.

using System;
using System.Collections.Generic;

namespace DifferentialEquations
{

    abstract class InitialValue
    {
        internal delegate double DerivativeFunction(double x, double y);

        internal InitialValue(DerivativeFunction func, double initValX, double initValY, double solPoint)
        {
            givenFunction = func;
            this.initValX = initValX;
            this.initValY = initValY;
            this.solPoint = solPoint;
            CalcDistance();
        }

        internal double Delta
        {
            get
            {
               return delta;
            }
        }

        internal int Distance
        {
            get
            {
                return distance;
            }
        }

        //calculates absolute distance between the x point for required solution and the initial value of x
        private void CalcDistance()
        {
            double res = 0.0;
            
            if (this.initValX < 0 && this.solPoint < 0)
            {
                res = Math.Abs(Math.Min(this.initValX, this.solPoint) - Math.Max(this.initValX, this.solPoint));
            }
            else if (this.initValX > 0 && this.solPoint > 0)
            {
                res = Math.Max(this.initValX, this.solPoint) - Math.Min(this.initValX, this.solPoint);
            }
            else
            {
                res = Math.Abs(this.solPoint) + Math.Abs(this.initValX);
            }

            this.distance = (int)res;
        }

        internal abstract double[] SolveDifferentialEquation();

        protected DerivativeFunction givenFunction;
        private const double  delta=0.01;
        protected double initValX;
        protected double initValY;
        protected double solPoint;
        private int distance=0;

    }

    sealed class EulerMethod:InitialValue
    {

        /// <summary>
        /// Initialises the Euler method
        /// </summary>
        /// <param name="func">differential function to approximate</param>
        /// <param name="initValX">initial value for x</param>
        /// <param name="initValY">initial value for y</param>
        /// <param name="solPoint">solution point: value of x for which y=f(x) is required</param>
        internal EulerMethod(DerivativeFunction func, double initValX, double initValY, double solPoint):base(func,initValX, initValY, solPoint)
        {
            altDelta = base.Delta;
        }

        /// <summary>
        /// Initialises the Euler method
        /// </summary>
        /// <param name="func">differential function to approximate</param>
        /// <param name="initValX">initial value for x</param>
        /// <param name="initValY">initial value for y</param>
        /// <param name="solPoint">solution point: value of x for which y=f(x) is required</param>
        /// <param name="delta">alternative value for delta</param>
        internal EulerMethod(DerivativeFunction func, double initValX, double initValY, double solPoint, double delta)
            : base(func, initValX, initValY, solPoint)
        {
            //this check will not provide for all cases but good enough for demonstration
            if (delta < base.Distance)
                altDelta = delta;
            else
            {
                Console.WriteLine("Delta value too large. Using default delta of 0.01.");
                altDelta = base.Delta;
            }
        }

        /// <summary>
        /// Solves a differential equation using the Euler method
        /// </summary>
        /// <returns>y=f(x) solution at the required point of x</returns>
        internal override double[] SolveDifferentialEquation()
        {
            direction = (this.initValX < base.solPoint) ? true : false;

            //the position at index 0 will be taken by initial value
            int numPoints = (int)(base.Distance/ this.altDelta)+1;
            double[] solutionPoints = new double[numPoints];
            double x = this.initValX;

            solutionPoints[0] = this.initValY;

            for (int i = 0; i < numPoints-1; i++)
            {
                solutionPoints[i + 1] = solutionPoints[i] + altDelta * this.givenFunction(x, solutionPoints[i]);
                if (direction)
                    x = x + altDelta;
                else
                    x = x - altDelta;
            }
            Console.WriteLine("Euler Solution at x=: " + base.solPoint + " is " + solutionPoints.GetValue(numPoints - 1));
            return solutionPoints;
        }

        private bool direction;
        private double altDelta;
    }

    sealed class RungeKuttaMethod : InitialValue
    {
        /// <summary>
        /// Initialises the Runge-Kutta method
        /// </summary>
        /// <param name="func">differential function to approximate</param>
        /// <param name="initValX">initial value for x</param>
        /// <param name="initValY">initial value for y</param>
        /// <param name="solPoint">solution point: value of x for which y=f(x) is required</param>
        internal RungeKuttaMethod(DerivativeFunction func, double initValX, double initValY, double solPoint)
            : base(func, initValX, initValY, solPoint)
        {
            altDelta = base.Delta;
        }


        /// <summary>
        /// Initialises the Runge-Kutta method
        /// </summary>
        /// <param name="func">differential function to approximate</param>
        /// <param name="initValX">initial value for x</param>
        /// <param name="initValY">initial value for y</param>
        /// <param name="solPoint">solution point: value of x for which y=f(x) is required</param>
        /// <param name="delta">alternative delta value</param>
        internal RungeKuttaMethod(DerivativeFunction func, double initValX, double initValY, double solPoint, double delta)
            : base(func, initValX, initValY, solPoint)
        {
            //this check will not provide for all cases but good enough for demonstration
            if (delta < base.Distance)
                altDelta = delta;
            else
            {
                Console.WriteLine("Delta value too large. Using default delta of 0.01.");
                altDelta = base.Delta;
            }
        }

        /// <summary>
        /// Solves a differential equation using Runge-Kutta method
        /// </summary>
        /// <returns>y=f(x) solution at the required point of x</returns>
        internal override double[] SolveDifferentialEquation()
        {
            direction = (this.initValX < base.solPoint) ? true : false;

            //the position at index 0 will be taken by initial value
            int numPoints = (int)(base.Distance/ this.altDelta)+1;
            double[] solutionPoints = new double[numPoints];
            double x = this.initValX;
            double k1, k2, k3, k4 = this.initValY;
            double scale = 1.0/6.0;
            double increment = altDelta/2;

            solutionPoints[0] = this.initValY;

            for (int i = 0; i < numPoints - 1; i++)
            {
                k1 = this.givenFunction(x, solutionPoints[i]);
                k2 = this.givenFunction(x + increment, solutionPoints[i] + increment * k1);
                k3 = this.givenFunction(x + increment, solutionPoints[i] + increment * k2);
                k4 = this.givenFunction(x + altDelta, solutionPoints[i] + altDelta * k3);
                solutionPoints[i + 1] = solutionPoints[i] + altDelta*scale *(k1+2.0*k2+2.0*k3+k4);
                if (direction)
                    x = x + altDelta;
                else
                    x = x - altDelta;
            }
            Console.WriteLine("Runge-Kutta Solution at x=: "+base.solPoint+" is " + solutionPoints.GetValue(numPoints - 1));
            return solutionPoints;
        }

        private bool direction;
        private double altDelta;
    }

    
    class EntryPoint
    {
        static void Main(string[] args)
        {
            EulerMethod eu = new EulerMethod(DerivFunction, 0, 0,5);
            double[]resEU = eu.SolveDifferentialEquation();
            
            RungeKuttaMethod rk = new RungeKuttaMethod(DerivFunction, 0, 0,5);
            double[] resRK = rk.SolveDifferentialEquation();
            
            Console.ReadLine();
        }

        static double DerivFunction(double x, double y)
        {
            //double pow = (1.0 / 3.0);
            //return x * Math.Pow(y, pow);
            //return 14.0 * x - 3.0;
            return Math.Cos(x) - 3.0;
        }
    }
}

Below are the results for three initial value problems. Here the h increment is 0.01. The Runge-Kutta method provides a very good approximation.

Table 1
Differential Equation Euler’s Runge-Kutta Exact Solution (at x=5)
\frac{dy}{dx}=xy^{\frac{1}{3}}, y(1)=1 26.8956 26.9999 27
\frac{dy}{dx}=4x-3, y(0)=0 159.6499 159.9999 160
\frac{dy}{dx}=\cos{x}-3, y(0)=0 -15.9553 -15.9589 -15.9589

C# Language Features

This blog post is about some features of C# language. Sometimes, as I am thinking about a certain language feature, I ask myself why something is the way it is. Some questions find immediate answers. Mind you, the questions are almost always rather obscure and perplexing. For example, it is immediately clear why interfaces cannot contain other interfaces. It’s because a nested interface implementation is not different from a non-nested one. However, for other questions I don’t have immediate answers and have to spend some time thinking. Take a look and see how many of them you can answer easily. I am providing my answers right below.

  1. Why is it only possible to have one static constructor in a class?
  2. Why is it not possible to declare a class or a struct inside a method?
  3. Classes can be nested. Can methods contain methods? Can methods contain delegates?
  4. Why static constructors can only operate on static fields?
  5. Do structs receive default parameterless constructors if no constructor is provided in the struct definition?
  6. It is possible to have checked{} and unchecked{} regions of code. What is the default behavior?
  7. Why is it not possible to specify an access specifier on a static constructor? What is the provided access on static constructors?
  8. Instance constructors are not inherited. Are static constructors inherited?
  9. Will a class that has only a static constructor receive a default instance constructor?
  10. What is the output of the following piece of code:

using System;
using System.Collections.Generic;

namespace TestingStatic
{
    class Test
    {
        public Test()
        {
            t = 3;
            Console.WriteLine("Inside instance constructor");
        }

        static Test()
        {
            t = 4;
            Console.WriteLine("Inside static constructor");
        }

        public static int t;
    }
   
    class EntryPoint
    {
        static void Main(string[] args)
        {
            Test thisTest = new Test();
            Console.WriteLine(Test.t);
            Console.ReadLine();
        }
    }
}

Why is it only possible to have one static constructor in a class?

A static constructor is invoked when a class is used for the first time. If there were two or more static constructors, then the runtime would somehow need to know which one it should call. A static constructor cannot have input or output parameters, thus purely syntactically two static constructors in a class definition would have identical signatures. This makes it impossible to define more than one static constructor, but makes it very clear for the runtime which one should be invoked.

Why is it not possible to declare a class or a struct inside a method?

This questions can be answered in several ways. My favorite answer stems from the language design and type hierarchy. If you have a better answer – please let me know.
So, C# constructs can be broken into types and members. The types are: classes, structs, interfaces, delegates and enums. The members are fields, methods, indexers, properties, events, operators, constructors and destructors. Are you thinking that an int is also a type? Yes, it is, but it is a primitive type which, under the above hierarchy, is simply a field. The main design idea is that types contain members that perform some operations. Here ‘contains’ means defines and operates on.
A certain inner hierarchy exists within the types. At the top we have classes and structs that can contain all members and some other types including other classes and structs. It would not make sense for a class to contain interfaces, because this would mean they still need to implement them as well. So, we could say a class ‘contains’ an interface when it implements it. Interfaces follow classes in the sub-hierarchy. Interfaces can contain some members only, like methods, properties, events and indexers. Then come enums that can only contain named constants. Delegates are the last in this sub-hierarchy: they cannot contain any types or members, but they are still types.
Members cannot contain types. Members cannot contain other members, except for fields. For example, a method can declare its own local fields of primitive type, like int, object or string. Constructors, destructors and events are forms of methods. Properties are extensions of fields, and they are also methods. Methods cannot contain other methods. Events are fields, but of delegate type.

Classes can be nested. Can methods be nested? Can methods contain delegates?

Just like the answer to the second question above, no, methods cannot contain other methods. Methods cannot define delegates, because members cannot contain/define named types.

Why static constructors can operate only on static fields?

This has to do with the order in which static and instance fields and constructors are allocated in the memory and initialised. When a class contains static fields and/or static constructor, the order of field initialisation is the following. The static field initializers execute before the static constructor. Thus, when these fields are referenced in the body of a static constructor, which happens next, they are already initialised. At the point of static constructor invocation, no instance fields have been yet initialised by the CLR. This happens next, and it is followed by the instance constructors. So, static constructors can only operate on static fields, because non-static fields are not guaranteed to have been properly type initialised by the CLR at the point of execution.

Do structs receive default parameterless constructors if no constructor is provided in the struct definition?

The answer is yes, because all value types receive an implicit parameterless constructor. This makes it possible to declare structs and other value types using new keyword. Also, because a parameterless constructor is provided, no parameterless constructor can be defined/overloaded in the struct (Note: true for C# up to C# 6.0!). However, structs can define own constructors with parameters. Note that, unlike with classes, this implicit parameterless constructor is not visible in ILDASM.

It is possible to have checked{} and unchecked{} regions of code. What is the default behavior?

The default, if one can call it default, is unchecked. However, this is only if the compiler is invoked without the ‘\checked’ or ‘\unchecked’ options. Some people become surprised when they realise that overflow occurs silently in the unchecked region of code. However, before you decide to use ‘checked’ from now on, consider the impact on performance.

Why is it not possible to specify an access modifier on a static constructor? What is the provided access on static constructors?

Static constructors are always private. It is not possible to override this access modifier, thus it is not possible to specify an access modifier on a static constructor.

Instance constructors are not inherited. Are static constructors inherited?

No they are not inherited. No constructors or destructors are inherited.

Will a class that has only a static constructor receive a default instance constructor?

The answer is yes, it will receive a default parameterless constructor. This is because it still should be possible to create instances of such class with the new keyword. Note that this is true for non-static classes only. Static classes do not receive a default parameterless constructor.

What is the output of this code?

The output is given below. You should be able to understand why this is the output if you understand the answer to question 4:

Inside static constructor
Inside instance constructor
3

I hope at least some of you found this useful. Let me know if you think any of my answers can be improved upon!