# Nullable Types

In C# a value of null can be assigned to reference types, or more specifically to nullable types. When a reference is set to null it means it is not pointing to any object. A value of null is perfectly legal for reference types.

C# is a strongly typed language and the concept of reference and value types is at the core of the language semantics. For example it is not possible to override the ?? operator because it would allow the developers to potentially break the boundary between reference and value types. In case you are asking what is this ‘??’ operator, it is called the null-coalescing operator, and it allows for a short-hand comparison and assignment like this:

 string firstString = default(string);

string secondString = firstString ?? "apples";



In the above, secondString is assigned to “apples” because the ?? operator works like this: if firstString is null, assign to “apples”, otherwise, assign to firstString. Note that if instead of “apples” I had some null-valued variable or just null, the result of this assignment would be null.

If you have encountered the null-coalescing operator before, I bet you did not know it can be nested! For example, the below code is legal. Can you work out what the value of thirdString will be?

 string firstString = default(string);

string secondString = firstString ?? "apples";

string thirdString = firstString ??
secondString ??
"pears";


The value of thirdString will be set to “apples” because the nested null-coalescing works like nested if-else. So, the above can be read like this: if firstString is not null assign to its value, otherwise, if secondString is not null assign to its value, else assign to “pears”.

Value types cannot be used in null-coalescing. This is because it is not legal to assign a null to a value type. For example, something like this will never compile:

 int firstInt = null;


In C#, an int is the same as Int32 and is internally represented as a struct, which is a value type. There is another special struct called System.Nullable. It is special because it can represent any of the C# value types, plus, it can take on the null value. Can you already see where I am going with this? This means that it is possible to use the null-coalescing with value types if we make them nullable. To make a value type nullable one should use ? operator. For example:


int? firstInt = null;
int? secondInt = firstInt ?? 3;


firstInt is now a nullable type and can be used with the null-coalescing operator. Is there any boxing overhead once we make a value type nullable? No, there is not. The firstInt and secondInt integers are still value types, i.e. they are still structs. At no point they get boxed into objects. As a matter of fact, a compiler translates ?? operator into a set of load and Brtrtue instructions. A call to get_HasValue()
is made on firstInt, which returns false. Brtrue instruction see this and transfers the control to the load and assignment to 3.

# Delegates and Actions

In one of my older blogs I have written a short tutorial on delegates and events.
I wrote the blog after reviewing the concepts of delegates and events and wanted to create a hopefully simplified version of the material for my blog readers.

Recently I have come across another C# class that is a wrapper on a delegate that does not return any values. This class is called Action and one can read more about it on the MSDN site.

Action class simplifies creation of delegates that do not need to return a value. Several built-in features of C# use Action class. One of these is Parallel.ForEach() which can be used to loop through an enumerable collection in a parallel fashion. A call to ForEach would accepts two arguments: the IEnumerable collection and Action delegate (perhaps anonymous) to perform on each item in the collection. Here is a quick example code that prints out the content of intList in asynchronous manner:

List<int> intList = new List<int>() {1,2,3,4,5};
Parallel.ForEach(intList, (e, state, i) => Console.WriteLine(e.ToString()));



If you examine the IL code for the above, you will see that an instance of System.Action class encapsulates the anonymous method.

Thinking about the Action class got me thinking if I could rewrite my previous delegate tutorial with a simplified syntax. This is what I have come up with:

using System;
using System.Collections;

namespace ActionsAndEvents
{
public class ListWithEightEvent : ArrayList
{
// event on an action
public event Action EnteredEight;

// invoke event after an addition of an 8
{
if (EnteredEight != null && value.ToString() == "8")
EnteredEight();
return i;
}
}

class Test
{
public static void Main()
{
ListWithEightEvent list = new ListWithEightEvent();
// register the event with AddedEight method

string input;
Console.WriteLine("Keep entering numbers. Enter 'x' to stop.");
do
{
} while (input != "x");

}

// This will be called whenever the list receives an 8
{
Console.WriteLine("You have entered an 8!");
}
}
}


To me, the above code is cleaner and simpler than the earlier delegate and EventHandler version.

# Inline Initialization

I must admit that I was rather surprised when I learnt that inline initialization translates into the default constructor-based initialization in the IL. So, for example, these two pieces of code are equivalent:

    sealed internal class Inline
{
private int x = 5;
}


is the same as

    sealed internal class Inline
{
public Inline()
{
x = 5;
}

private int x;
}


The above can be verified by examining the IL of the first example through the ILDASM. Below I am showing the snapshot of what the IL code in .ctor generated by the IL looks like:

As pointed out in ([1], chapter 8), this means that non-constant fields initialized inline lead to the IL code explosion.

In one of my previous posts on C# language features I wrote that parameter-less constructors are not allowed in value types before C# 6.0. This is the reason why inline initialization is illegal in structs. For example, the following won’t compile:

    struct NoInline
{
int a = 5;
}


If you examine the IL for the struct, you will see that it has no code for the default .ctor. The implicit parameter-less constructor is intrinsic to the struct (i.e. it is provided by the C# language implementation) and cannot be modified. Thus, for C# struct, there is no way to translate the inline initializers into the initialization inside the parameter-less constructor.

The story is very different for static fields. In value types, static fields can be initialized inline. As you may have already guessed, such initialization does translate into the parameter-less static constructor initialization. For example, these two pieces of code are equivalent in the IL:

struct NoInline
{
static int a = 5;
}


is the same as

struct NoInline
{
static NoInline()
{
a = 5;
}

static int a;
}


References:
[1] J.Richter. CLR via C#, 4th edition. 2014.

Greetings to all my blog readers!

In C# the readonly keyword can be very handy. It allows one to define fields that are assigned at the point of instance creation (in the instance constructor), or inline, at the point of declaration. Once assigned, the fields remain immutable and cannot be reassigned by the type consumers. Of course, another type instance can be created, and it may have different values assigned to its readonly fields (e.g. if the constructor uses input parameters to perform the assignment).

Unlike constants, readonly fields are initialized at run time. This can be useful to avoid the application assembly versioning problem that may arise from using constants. Great! Let’s go ahead and create some readonly arrays!

In the below code I create an int array myIntArray, which is marked with readonly keyword and assigned in the constructor of IntArray class. In the Main method I am subsequently modifying the first value in the array and print it out to show that it has changed. The code compiles and prints out 6, 2, 3, 4 and 5. But wait! Didn’t I just say that readonly fields cannot be modified by the type consumers? What is going on here?

using System;
using System.Linq;

{
sealed internal class IntArray
{
internal IntArray()
{
myIntArray[0]=  1;
myIntArray[1] = 2;
myIntArray[2] = 3;
myIntArray[3] = 4;
myIntArray[4] = 5;
}

internal readonly int [] myIntArray= new int[5];
}

class EntryPoint
{
static void Main(string[] args)
{
IntArray ia = new IntArray();
ia.myIntArray[0] = 6;

for (int i = 0; i < ia.myIntArray.Count(); i++)
Console.WriteLine(ia.myIntArray[i].ToString());

}
}
}


Well, in the above code we got exactly what we were asking for – the int array is readonly, that is, its reference is readonly. So, if we declare another instance of IntArray and attempt the following reassignment, it won’t work:

IntArray ia2 = new IntArray();
ia2.myIntArray = ia.myIntArray;


The above won’t compile and the error is: ‘A readonly field cannot be assigned to (except in a constructor or a variable initializer).’

When the readonly field is a reference type, it is its reference and not the object that becomes readonly. To make the object readonly, I can suggest doing one of the following two:

1. The easiest and the most elegant way is to make the readonly field private, assign it once in the constructor, and make it accessible via the readonly property:

internal IntArray()
{
myIntArray[0]= 1;
myIntArray[1] = 2;
myIntArray[2] = 3;
myIntArray[3] = 4;
myIntArray[4] = 5;
}

internal int[] MyIntArray
{
get { return myIntArray; }
}

private readonly int [] myIntArray= new int[5];


… in the type consumer…

IntArray ia = new IntArray();
int[] arrayCopy = ia.MyIntArray;

for (int i = 0; i < arrayCopy.Count(); i++)
Console.WriteLine(arrayCopy[i].ToString());



2. You can achieve reference field immutability through implementing readonly interfaces. Check out this post if you want to know more. With this design, the type consumer would need to make a conscious choice to ensure immutability.

# Decision Tree Classifier – Part 1

To continue my blogging on machine learning (ML) classifiers, I am turning to decision trees. The post on decision trees will be in two parts. Part 1 will provide an introduction to how decision trees work and how they are build. Part 2 will contain the C# implementation of an example decision trees classifier. As in all my posts, I prefer clear and informal explanation over the terse mathematical one. So, any pedants out there – look away now!

Decision trees are a great choice for inductive inference and have been widely used for a long time in the field of Artificial Intelligence (AI). In this post we will cover the decision tree algorithm known as ID3.
There are several reasons why decision trees are great classifiers:

• decision trees are easy to understand;
• decision trees work well with messy data or missing values;
• decision trees give insight into complex data;
• decision trees can be visualized to allow for inference inspection or correction;
• decision tree building algorithms can be as simple or sophisticated as required (e.g. they can incorporate pruning, weights, etc.);

Decision trees work best with discrete classes. That is, the output class for each instance is either a string, boolean or an integer. If you are working with continuous values, you may consider rounding and mapping to a discrete output class, or may need to look for another classifier. Decision trees are used for classification problems. For example, the taxonomy of organisms, plants, minerals, etc. lends itself naturally to decision tree classifiers. This is because in a field of taxonomy we are dealing with a set of records containing values for some attributes. The values come from a finite set of known features. Each record may or may not have a classification. Medicine is another field that makes use of decision trees. Almost all illnesses can be categorized by symptoms, thus decision trees aid doctors in illness diagnosis.

It is time for an example, which I am borrowing from [1]. The tennis playing example in ML is like the ‘Hello World’ in programming languages. We are given some data about the weather conditions that are appropriate for playing tennis. Our task is to construct a decision tree based on this data, and use the tree to classify unknown weather conditions. The learning data is:

Table 1
Outlook Temperature Humidity Wind Play Tennis
sunny hot high strong no
sunny hot high weak no
overcast hot high weak yes
rain mild high weak yes
rain cool normal weak yes
rain cool normal strong no
overcast cool normal strong yes
sunny mild high weak no
sunny cool normal weak yes
rain mild normal weak yes
sunny mild normal strong yes
overcast mild high strong yes
overcast hot normal weak yes
rain mild high strong no

Table 1 tells us which weather conditions are good for playing tennis outdoors. For example, if it is sunny, and the wind is weak, and the temperature is mild but it is very humid, then playing tennis outdoors is not advisable. On the other hand, on an overcast day, regardless of the wind, playing tennis outdoors should be OK.

The most fundamental concept behind the decision tree classifiers is that of order. The amount of order in the data can be measured by assessing its consistency. For example, imagine that every row in Table 1 had a ‘yes’ associated with the Play Tennis column. This would tell us that regardless of the wind, temperature, humidity, we can always play tennis outside. The data would have perfect order, and we would be able to build a perfect classifier for the data – the one that always says ‘yes’ and is always correct. The other extreme would be where the outcome class differs for every observation. For an inductive learner like a decision tree, this would mean that it is impossible to classify new instance unless it perfectly matches some instance in the training set. This is why decision tree classifier won’t work for continuous class problems.

In information theory the concept of order (actually, lack of it), is often represented by entropy. It is a scary word for something that is rather simple. Entropy definition was proposed by the founder of information theory – Claude Shannon, and it is a probabilistic measure of categorical data defined as:

$Entropy(S) = -\sum_{i=1}^{n}p_i \log_2 p_i$

where S is the training data set with some target classification that takes on n different values. The discrete target values are assigned their probabilities p, and $\log_2$ is logarithm base 2. The base two is due to the entropy being a measure of the expected encoding in binary format (i.e. 0s and 1s). Note that the smaller are individual probabilities, the greater is the absolute value of entropy. Also, the closer is entropy to zero, the more order there is in the data.

We now can calculate the entropy of our data, one column at a time. Let’s take the Play Tennis column. We have 9 ‘yes’ and 5 ‘no’, this gives us the entropy of $Entropy(S)=-\frac{9}{14} \log_2(\frac{9}{14}) + -\frac{5}{14}\log_2(\frac{5}{14})=0.940$, accurate to 3 d.p. There is a reason why we started with the class column. It will be used to measure the information gain we can achieve by splitting the data one attribute/column at a time, which is the core idea behind building decision trees. At each step we will be picking the attribute that achieves the greatest information gain to split the tree recursively. The information gain is, essentially, a positive difference in entropy; and the root node of any decision tree is the attribute with the greatest information gain. Information gain is formally defined as:

$InfoGain(S,A) = Entropy(S) -\sum_{i=1}^{A}\frac{|S_i|}{|S|}Entropy(S_i)$

where A is the set of all values some attribute takes. For example, the Wind column takes values in {strong, weak}, and the Humidity column takes on values in {high, normal}. $|S_i|$ and $|S|$ are the number of records taking on the value i and the total number of records respectively. Finally, $Entropy(S_i)$ is the corresponding entropy of all records where the attribute A takes on value i with respect to the target class. The last sentence is important. It means that, given all the records where attribute A takes on the same value i we are interested to know the dispersion of the target class. For example, take a look at all the records where Wind is strong. Here we have 3 records where the target class is ‘no’ and 3 records where it is ‘yes’. Thus, the corresponding entropy is $Entropy(S_{wind=strong})=-\frac{3}{6} \log_2(\frac{3}{6}) -\frac{3}{6} \log_2(\frac{3}{6})=1.0$

So, which attribute/column should be the tree’s root node? It should be the one that achieves the greatest information gain:

$InfoGain(S,Wind) = 0.940 - \frac{8}{14}\times 0.811 - \frac{6}{14}\times 1.0=0.048$
$InfoGain(S,Humidity) = 0.940 - \frac{7}{14}\times 0.985 - \frac{7}{14}\times 0.592=0.151$
$InfoGain(S,Temperature) = 0.940 - \frac{4}{14}\times 1.0 - \frac{4}{14}\times 0.811 - \frac{6}{14}\times 0.918=0.029$
$InfoGain(S,Outlook) = 0.940 - \frac{5}{14}\times 0.971 - \frac{4}{14}\times 0.0 - \frac{5}{14}\times 0.971=0.246$

It is clear that Outlook achieves the greatest information gain and should be the root of the tree. The worst information gain would be achieved with Temperature attribute. This is what our tree now looks:

At this stage we have the training data split into three clusters. Where outlook is overcast, we have a target class of ‘yes’. We now proceed recursively building the sub-tree in the other two clusters. Where outlook is sunny, we have a total of 5 records. These are reproduced in Table 2 below. Out of three possible attributes for the node we again pick the one that achieves the greatest information gain. If you inspect the data in Table 2, it is easy to see that humidity is that attribute since all five records can be split as following: if humidity is normal, then ‘yes’, else ‘no’. Its corresponding information gain is

$InfoGain(S_{sunny},Humidity) = 0.970 - \frac{3}{5}\times 0 - \frac{2}{5}\times 0=0.970$

where 0.970 is entropy of Outlook is sunny, again, with respect to the Play Tennis class:$Entropy(S_{sunny}) = - \frac{3}{5}\log_2(\frac{3}{5})- \frac{2}{5}\log_2(\frac{2}{5})=0.970$.

Table 2
Outlook Temperature Humidity Wind Play Tennis
sunny hot high strong no
sunny hot high weak no
sunny mild high weak no
sunny cool normal weak yes
sunny mild normal strong yes

At this stage we have classified 9 records and our tree has 9 terminal descendant nodes (leaves):

We can now deal with the records in the last outlook cluster, i.e. where outlook is rain. Here we have five records, which are reproduced below in Table 3:

Table 3
Outlook Temperature Humidity Wind Play Tennis
rain mild high weak yes
rain cool normal weak yes
rain cool normal strong no
rain mild normal weak yes
rain mild high strong no

We apply the same algorithm as before by selecting the attribute for the next node that gives the greatest information gain. It is easy to spot that the Wind attribute achieves just that. Its total information gain is

$InfoGain(S_{rain},Wind) = 0.970 - \frac{3}{5}\times 0 - \frac{2}{5}\times 0=0.970$

The complete tree looks like this:

A few things should be noted about the final decision tree. Firstly, the Temperature attribute is completely missing. The constructed decision tree allowed us to see that this attribute is redundant, at least given the training dataset. Decision trees always favor the shorter trees over the longer ones, which is the main part of its inductive bias (or the algorithm’s policy). Secondly, this is not a binary tree, since we have parent nodes with more than two edges (i.e. the Outlook node). Finally, the constructed decision tree allows for a one-way searching algorithm. This means that we cannot back-track to the parent node. In the grand scheme of things this implies that a decision tree can end up at a locally optimal, rather than globally optimal solution.

The next blog post will be on how to implement the decision tree classifier in C#.

References:

[1]. T.M.Mitchell. Machine Learning. McGraw-Hill. 1997.