Immutability Through Interfaces in C#

In C#, immutability is a useful concept. It is easy to image a collection that, under some circumstances, should act as an immutable type – take for example a custom method that is designed to print collection’s values after some fancy formatting. Such a method should deal with the collection in a read-only manner. Also, I think it is a good programming practice to provide users with the ability for a class to be passed around and treated as an immutable entity, if the user of the class has a need to do so.

The perils of using reference types is that a complex external method may accidentally modify the object that is passed to it. There are several methods for safeguarding against this. For example, you may want to store a separate copy of the original object and ensure it is unchanged after some method call. Alternatively, you can define two version of the same object, one for use as a constant, and one for use as a writable object. There is also a way to achieve this through the use of interfaces. This idea is not mine, and I first read about in “Accelerated C# 2008”, 2nd edition.

Ok, so the concept is simple. Define three interfaces, where one of them declares the methods for read-only actions; another interfaces declares methods for writable-actions only; and finally, the last interface combines the two. The class for which the controlled immutability is desired, would implement the combined interface. The users of the class would have a choice of, firstly, how to use it themselves (as they can restrict their own access), and secondly, how to pass it to some methods. Note that the methods would need to explicitly accept the object as the ‘agreed by design’ interface type.

In my simple example below I would like to have a control over who has the ability to add items to a generic collection. I do this via making GenericCollection to implement IMaintainable interface, which derives from IImutable and IModifiable.

using System;
using System.Collections.Generic;

namespace ImmutabilityThroughInterfaces
{
    public interface IImmutable<T>
    {
        T GetItem(int p);
        int CollectionSize();
    }

    public interface IModifiable<T>
    {
        void AddItem(T t, int p);
    }

    public interface IMaintainable<T> : IModifiable<T>, IImmutable<T> { }

    public sealed class GenericCollection<T> : IMaintainable<T>
    {
        public GenericCollection(int size)
        {
            Collection = new T[size];
        }

        public void AddItem(T t, int p)
        {
            if (p>=0 && p < Collection.Length)
              Collection[p] = t;
        }

        public T GetItem(int p)
        {
            if (p >= 0 && p < Collection.Length)
                return Collection[p];
            else
                return default(T);
        }

        public IEnumerator<T> GetEnumerator()
        {
            foreach (T item in Collection)
                yield return item;
        }

        public int CollectionSize()
        {
            return Collection.Length;
        }

        private T[] Collection;
    }
}

The Main method shows a rather simplified use example. Notice that NoAttemptToModify method accepts IImmutable collection. Inside this method, it is not possible to call AddItem, so this method is read-only with respect to the collection.

namespace UseExample
{
    using ImmutabilityThroughInterfaces;
    class EntryPoint
    {
        static void Main(string[] args)
        {

            GenericCollection<int> intColl = new GenericCollection<int>(3);
            intColl.AddItem(1, 0);
            intColl.AddItem(2, 1);
            intColl.AddItem(3, 2);

            var immutableCollection = intColl as IImmutable<int>;
            var maintainableCollection = intColl as IMaintainable<int>;

            AttemptToModify(maintainableCollection, 2);
            NoAttemptToModify(immutableCollection, 1);

            Console.ReadLine();
        }

            public static void AttemptToModify(IMaintainable<int> collection, int pos)
           {
                collection.AddItem(5, pos);
           }

            public static void NoAttemptToModify(IImmutable<int> collection, int pos)
            {
                  collection.GetItem(pos);
            }
    }
}

In this post I gave an example of a simple, but rather elegant way to ensure controlled immutability for custom types.

Advertisements

Object.Equals

I have been reading a chapter on C# canonical forms in “Accelerated C# 2008” by Trey Nash when I came across a piece of code that puzzled me. The book is an excellent resource on the language and I can recommend it to everyone, even the more seasoned C# developers.

So, the code I am referring to is on overriding Object.Equals method for reference types. You can find it in chapter 13 on page 386 of the 2nd edition. I am not allowed to reproduce the whole program here, but I will highlight the lines of code that confused me. The code is an example of how to override Object.Equals for a reference type variable when you need the value type semantics. The author provides the override with the following signature:

 
public override bool Equals(object obj)
{
   ...
}

In addition, the example has a static definition for == operator. The passed parameters are of the class type the method is defined in:

public  static bool operator==(ComplexNumber me, ComplexNumber other)
{
    return Equals(me, other);
}

In the Main, two instanced of the ComplexNumber class are declared, both, with the same argument values. The test for equality is the invoked as:

System.Console.WriteLine("Results of Equality is {0}", referenceA == referenceB);

Ok, the above line of code will call the static method operator==, which in turn calls Equals(me,other). So, who calls Equals(object obj)? As I was reading this, I said to myself that there was no way this would compile as there is clearly a mistake in the number of parameters in the Equals call inside operator==. This bothered me since all other code examples in the book were ‘compileable’, at least in my head. So, I went to my desktop PC and typed-up the example, fully expecting it to fail. But it did not. It compiled and ran as expected. I added debug points to see which method is called after Equals(me, other). To my surprise – the Equals(object obj) was called! But it did not make any sense at all…

I was beginning to suspect some ‘implicit substitution’ with this voodoo magic, when I decided to look at the trusty IL in ILDASM. This is what I saw for op_Equality:

Figure 1

Aha! I am not crazy, because clearly the call is to Equals(object,object), not to Equals(object). So, how does a call to Equals(object,object) resolves to a call to Equals(object)? How?!

I was able to find the answer on the MSDN website. Object.Equals(Object,Object) is a static method, so, it cannot be overridden. The answer to my puzzle lies in the Remarks section, where we find this line:

  • If the two objects do not represent the same object reference and neither is null, it calls objA.Equals(objB) and returns the result. This means that if objA overrides the Object.Equals(Object) method, this override is called.

This is what was going on!

Random Numbers Generator

At the heart of any Monte Carlo method are two fundamental building blocks. The first is the generation of uniform random variables. The second is the transformation of the generated uniform distribution to whichever is the required distribution for the task at hand. For example, a normal one. The quality of random number generator determines the quality of the final distribution.

No matter how good is the underlying random number generation algorithm, the outcome is not truly random, since the algorithm that produces the numbers is fully deterministic. In fact, I actually believe that very few things in live are truly random. Even the seemingly random roulette wheel is influenced by the physical properties such as friction, weight and skewness. So, let’s embrace the fact that we’ll only ever be able to achieve pseudo-randomness. This should be good enough for all practical purposes. Thus, I will stop using ‘pseudo-random’, and use the word ‘random’ instead.

In this post I will show you some very basic algorithms for generating random numbers. If you do further research, you will find that the more complex algorithms are extensions of the basic ones.

Take int x = 1. We can use x to generate a next random number. How? Well, take, for example:

x_next = (25,173*x+13,849) %201

where % is the modulus operator. x_next is (also an integer) = 28. Ok, 28 is kind-of random. Let’s repeat the above calculation where x=28. x_next becomes 118. Hm, if I keep repeating the same ‘substitute/calculate’ steps 10 times, I get:

28, 118, 16, 145, 106, 43, 34, 4, 172, 196

Well, that looks pretty random to me! I use this algorithm to generate 200 integers and show them in Figure 1 (hint: click on it to see the details).

Figure 1
Figure 1

Before I point out a problem with this random sample, let me explain what is going on here. The way I generated my random variables is called linear congruential generator, originally introduced by D.H.Lehmer in 1951. It has a general form of

x_{i+1}=(a \times x_{i}+c) \mod{m}

The above will produce a sequence of random numbers in the region between 1 and m-1. If the required uniform variables should be in the unit interval, we can scale by 1/m. Referring back to our initial example, a, c and m are some constants. It turns out that the values of these constants are critical in determining the quality of the generated random variable sequence. Looking at Figure 1 you should be able to spot that the random sequence repeats itself exactly after only 65 iterations. Clearly, this is not a very good random number generator! Ideally, we need an algorithm that can simulate all m-1 random variables without repeating itself. This condition is known as ‘having a full period length’, and it is one of the standard criteria for selecting a random number generator.

In (1), we find the list of conditions a, c and m must satisfy to ensure we obtain a full period length (if c!=0):

  • c and m are relatively prime (their only common divisor is 1);
  • every prime number that divides m, also divides a-1;
  • a-1 is divisible by 4, if m is;

So, the reason we did not obtain the full period length is because the chosen constants do not meet these conditions. If we change m to be 217, it allows for a full period of length 216.

The demonstrated random number generator is the basis for more complex generators. A good random number generator should be able to generate millions of numbers without repeating the sequence. If we are to stick to the int representation, we have to take care not to overflow. Reminder: a System.Int32 variable can hold a maximum value of [(2^32)-1]/2-1.

Many various extensions to the linear congruential generator have been developed over the years; many involving a combination of several linear ones which result in a much longer period length. In C#, the System.Random class provides an easy way to generate random variables. The full implementation of this class can be found here. Random class can be initialised with a given seed, or without. When no seed is provided, the default seed is Environment.TickCount, which is the time in milliseconds since the computer been booted-up. The algorithm behind System.Random is based on the routine in “Numerical Recipes In C: The Art Of Scientific Computing”, which, according to the authors, has been proposed by D.Knuth. The algorithm has elements of the linear congruential generator, but is essentially a mixed ‘subtractive method’ where an array of 55 pseudo-random integers is maintained. At the class initialization, the array is populated with values twice. First, the array is populated with values between 0 and [max(System.Int32)-1]. At this point, the index of the array which receives the value is selected using

x=(21 \times i)\mod{55}

This is ‘the element’ of the linear congruential generator I’ve mentioned above. Here i is in [1,54]. Note that c is set to zero, and the three conditions listed above do not apply. The second time the array is re-populated, it’s values are further randomized by cycling through the array four times and reassigning each value with a difference between its original value and a value at another index, which is approximately half-the-array length away. Positivity is ensured by adding max(System.Int32) to any negative number.

Every time a user asks for the next random number, the difference between two values in the array is returned. At the same time, one of the values in the array is replaced with the value returned to the user. There is a check for positivity, which is ensured by adding max(System.Int32) to any negative number.

The Random class has additional methods to allow for simulating random numbers in some range or even simulating random doubles. The latter returns a a double equal to a randomly generated integer scaled by (1/ max(System.Int32)).

 
References:
(1) P. Glasserman. “Monte Carlo Methods in Financial Engineering”, 2004. Springer-Verlag.

Comparing Numerical Integration Methods

In this post I will re-introduce and compare four well-known numerical integration techniques for approximating definite integrals of single-variable functions (e.g. polynomials). The compared techniques, in the order of least to the most accurate, are:

  • Rectangle rule;
  • Mid-point rule;
  • Trapezoid rule;
  • and Simpson’s rule;

Let me begin by an informal description of the idea behind numerical integration. All numerical integration involves approximating the area under the function’s curve. For example, Figure 1 is a plot of f(x)=2+cos(2*sqrt(x)) for x in the interval between 0 and 2. The area under the curve can be approximated in different ways. The simplest way is to break it up into many rectangles and compute their areas instead.

Figure 1
Figure 1

Figure 2 illustrates this simple ‘rectangle rule’ approach, where I only show three rectangles. Clearly, for this function, the rectangle rule approximation will overestimate the area. However, this is due to the fact that I began to break-up the area under the function’s curve left-to-right. This means that the area of the first rectangle is calculated by evaluating the functions at point x=a, and scaling by the distance between a and b, where b is the starting point of the next rectangle. Was I to start from left to right, as shown in Figure 3, I would have underestimated the area. If you are thinking that some kind of average of the two methods can give a more accurate approximation, you are absolutely correct. This, in fact, is what trapezoid-rule approximation does. Also, it is clear, that the more rectangles are used to break-up the area, the more accurate is the approximation.

Figure 2
Figure 2

 
Figure 3
Figure 3

 

It is probably a good time to formally define these numerical methods.
For any two adjacent points a and b, the area of a single rectangle is approximated as follows:

  • Rectangle rule: \int_a^b \! f(x) \, \mathrm{d}x \approx f(a)(b-a)
  • Mid-point rule: \int_a^b \! f(x) \, \mathrm{d}x \approx f(\frac{a+b}{2})(b-a)
  • Trapezoid rule: \int_a^b \! f(x) \, \mathrm{d}x \approx \frac{1}{2}[f(a)+f(b)](b-a)
  • and Simpson’s rule: \int_a^b \! f(x) \, \mathrm{d}x \approx \frac{1}{3}[f(a)+f(\frac{a+b}{2})+f(b)]

Note that in the above definitions, I am assuming that b>a, and that my increment step is set to 1. Below I am providing a rather naïve implementation of these four numerical integration methods in C#:

using System;

//functionality to perform several types of single-variable function numerical integration
namespace NumericalIntegration
{
    public abstract class NumericalIntegration
    {
        public delegate double SingleVarFunction(double x);

        public NumericalIntegration(double from, double to, double steps, SingleVarFunction func)
        {
            this.fromPoint = from;
            this.toPoint = to;
            this.numSteps = steps;
            eps=0.0005;
            h = (toPoint - fromPoint) / (numSteps + 1.0);
            this.func=func;
        }

        public abstract double Integration();

        protected double fromPoint;
        protected double toPoint;
        protected SingleVarFunction func;
        protected double numSteps;
        protected double eps;
        protected double h;
    }

    public sealed class RectangleRule: NumericalIntegration
    {
        public RectangleRule(double from, double to, double steps, SingleVarFunction func)
            :base(from,to,steps,func)
        { }

        public override double Integration()
        {
            double a = fromPoint;
            double value = func(fromPoint);
            for (int i = 1; i <= numSteps; i++)
            {
                a = a + h;
                value += func(a);
            }
                return h*value;
        }
    }
public sealed class MidPointRule: NumericalIntegration
    {
        public MidPointRule(double from, double to, double steps, SingleVarFunction func)
            :base(from,to,steps,func)
        {}
        
        public override double Integration()
        {
            double a = fromPoint;
            double value = func((2.0 * fromPoint + h) / 2.0);
            for (int i = 1; i <= numSteps; i++)
            {
                a = a + h;
                value += func((2.0 * a + h) / 2.0);
            }
            return h * value;
        }
    }

    public sealed class TrapezoidRule: NumericalIntegration
    {
        public TrapezoidRule(double from, double to, double steps, SingleVarFunction func)
            : base(from, to, steps, func)
        {}
        public override double Integration()
        {
            double a = fromPoint;
            double value = (func(fromPoint)+func(fromPoint+h));
            for (int i = 1; i <= numSteps; i++)
            {
                a = a + h;
                value += (func(a) + func(a + h));
            }
            return 0.5 * h * value;
        }
    }

    public sealed class SimpsonRule: NumericalIntegration
    {
        public SimpsonRule(double from, double to, double steps, SingleVarFunction func)
            : base(from, to, steps, func)
        {
            h = (toPoint - fromPoint) / (numSteps * 2.0);
        }

        public override double Integration()
        {
            double valueEven = 0;
            double valueOdd = 0;

            for (int i = 1; i <= numSteps; i++)
            {
                valueOdd += func(fromPoint + this.h * (2.0 * i - 1));
                if (i <= numSteps-1)
                  valueEven += func(fromPoint + this.h * 2.0 * i);
            }
            return (func(fromPoint)+func(toPoint)+2.0*valueEven+4.0*valueOdd)*(this.h/3.0);
        }
    }
}

In summary, the code defines an abstract NumericalIntegration class which has two members: a delegate to point to the function over which we need to integrate, an abstract Integration method which must be overridden by the deriving classes. The assignment of the integration parameters occurs through the base class constructor. However, in case of the SimpsonRule class, I override the definition of h.
 
Please note that for the sake of this comparison, I will not be attempting to estimate the number of rectangles or partitions needed to achieve some convergence criterion. However, I am still declaring eps in the base class constructor, which can be used for this purpose.
 
Ok, so, let’s evaluate the Integration methods of each rule for the following functions and parameters: a=0, b=2, number of partitions = 8. Also, we’ll compare the performance on the following three functions:

  • func1: f(x)=2+\cos(2\times\sqrt{x})
  • func2: f(x)=2+\sin(2\times\sqrt{x})
  • func3: f(x)=0.4\times\ln(x^2)

The last function’s lower bound is set to 0.1 instead of zero.
Table 1 summarises the results of each numerical integration rule. The last column shows the mathematical solution. Clearly, the Simpson’s rule is the most accurate here.

Table 1
func rectangle mid-point trapezoid Simpson’s solution (to 6 d.p)
1 3.68415 3.45633 3.46733 3.46000 3.46000
2 5.41971 5.51309 5.45394 5.49218 5.49946
3 -0.50542 -0.21474 -0.25244 -0.22756 -0.22676

For the sake of completeness I provide the Main function code as well:

namespace TestNumericalIntegration
{
    using NumericalIntegration;
    
    class Program
    {
        static void Main(string[] args)
        {
            double from = 0.1;
            double to = 2.0;
            double steps = 20.0;

            RectangleRule ri = new RectangleRule(from, to, steps, funcToIntegrate);
            MidPointRule mi = new MidPointRule(from, to, steps, funcToIntegrate);
            TrapezoidRule ti = new TrapezoidRule(from, to, steps, funcToIntegrate);
            SimpsonRule si = new SimpsonRule(from, to, steps, funcToIntegrate);

            double[] res = { ri.Integration(), mi.Integration(), ti.Integration(), si.Integration() };
            foreach (double r in res)
                Console.WriteLine(r.ToString("F5", System.Globalization.CultureInfo.InvariantCulture));
            Console.ReadLine();
        }

        static double funcToIntegrate(double x)
        {
            //return 2.0+System.Math.Cos(2.0*System.Math.Sqrt(x));       //first function
            //return 2.0 + System.Math.Sin(2.0 * System.Math.Sqrt(x)); //second function
            return 0.4 * System.Math.Log(x*x);                       //third function

        }
    }
}


I hope you’ve found this useful. In my next post I will examine various approaches to numerical integration of multi-dimensional integrals.

A Better Way to Define Generic Collections

Greetings to my blog readers!

Today I am feeling greedy. I want to create a collection class and I want it to do many-many things, including dancing and singing. I want it all! I want to be able to add items of any type to my collection. Thus, it has to be a generic collection, but I want it to be type-safe. I also want it to have iteration capability, so that I can access any value. Also, I want to have some removal functionality. And yes, I want it to dance and sing. Let’s code it up.
 
Below is my coveted collection class. Nice, huh?
 

public sealed class MyCollection<T>
{
        public MyCollection(int s)
        {
            if (s>=0)
              this.storage = new T[s];
        }

        public int MySize()
        {
            return storage.Length;
        }

        public void AddValueAt(T addItem, int pos)
        {
            if (pos<MySize())
               storage[pos] = addItem;
        }

        public T GetValueAt(int i)
        {
            if (i < MySize())
                return storage[i];
            else
                return default(T);
        }

        public void RemoveValueAt(int pos)
        {
            if (pos < MySize())
            {
                IEnumerable<T> query = storage.Where((value, index) => index != pos).ToArray();
                this.storage = new T[MySize() - 1];
                int newPos = 0;
                foreach (T q in query)
                {
                    this.AddValueAt(q, newPos);
                    newPos++;
                }
            }
        }

        public IEnumerator<T> GetEnumerator()
        {
            foreach (T item in storage)
                yield return item;
        }

        public void SingAndDance()
        {
          Console.WriteLine("Singing and dancing!");
        }
        private T[] storage;
}

MyCollection is a generic collection, meaning it can be used to store items of any type. I can create MyCollection of integers, strings, or even objects. The class has the required functionality to add new values to the ‘storage’ array, remove a value at some position, and I even override GetEnumerator() to allow users of MyCollection to use foreach loop.
 
As much as I like MyCollection, there is quite a bit that is wrong with it. Firstly, it is far from being a robust little collection. Even though I check for positions to be below the allocated array size, a line if code that says something like this will compile:


myColl.AddValueAt(2, -1);

Of course, I will get System.IndexOutOfRangeException at run time, but it would have been nice to catch little bugs like this at compile time. Also, my RemoveValueAt() is, at best, inefficient. Finally, I had to type all these lines of code just to have MyCollection class to carry out the basic functionality, and of course, to sing and dance too. There must be a better way to create collection classes in a rich programming language like C#. And there is a better way, and it involves inheritance. To be more specific, one should always make a custom collection class derive from System.Collections.ObjectModel.Collection. Doing so will provide loads of functionality to the custom collection for free, including: Add, Remove, RemoveAt, Clear, IndexOf, and many more. Take a look at the modified MyCollection class:

public sealed class MyCollection<T> : System.Collections.ObjectModel.Collection<T>
{
        public void SingAndDance()
        {
          Console.WriteLine("Singing and dancing!");
        }
}

And that is it. All other functionality, like AddValueAt, RemoveValueAt, MySize and even the GetEnumerator is provided by the base class (albeit under slightly different method names). Although, the attempt to remove an item at a negative index would still compile.
 
To conclude, the moral of this post is – reuse, reuse, reuse. Also, check-out this nice set of blog posts on collections and generics.