Inverted Index in C#

Greetings my blog readers!

This is my last post for 2015. All in all 2015 has been a good year. Many people from all over the world have read my posts, tried my code. Some even subscribed…

In this post I will do something rather simple – write an inverted index builder in C#. What is an inverted index? It is the simplest form of document indexing to allow for performing boolean queries on text data. For example, imagine we have several text files and we would like to find out which of them contain all or some of the search terms. We can group search terms using regular boolean AND and OR expressions.

To illustrate, let’s say we have two files, 1.xt and 2.txt. The two files contains the following:

file 1: one star is sparkling bright
file 2: two stars are sparkling even brighter

An inverted index on such data is built in four steps:

  1. Collect the documents.
  2. Tokenize the text by parsing each element.
  3. Perform linguistic pre-processing (e.g. removing punctuation, removing capitalization, etc.) to normalize the elements.
  4. Create an index in a form of a dictionary, where keys are the elements and the values are the documents that contain them.

It is really as simple as that. I am not going to code the normalization step as this would remove the main emphasis of this post. Instead, I will provide two simple extensions to allow for an AND and OR queries on the index.

So, given our short example files, how does the inverted index look like? Here is a visual example:

Simple Inverted Index
Simple Inverted Index

I use C# Dictionary to store such an index. The AND and OR are provided as extensions, and for a moment can handle only two keywords. It would be simple to remove this restriction by passing an array or a list to the extensions methods instead. In the end, the code is just an example of using LINQ (Intersect, Union and SelectMany). If you end up using this code, please add your own exceptions handling.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace InvertedIndex
{

    public static class Extensions
    {
        public static IEnumerable<string> And (this Dictionary<string, List<string>> index, string firstTerm, string secondTerm)
        {
            return(from d in index
                         where d.Key.Equals(firstTerm)
                         select d.Value).SelectMany(x=>x).Intersect
                            ((from d in index
                             where d.Key.Equals(secondTerm)
                             select d.Value).SelectMany(x => x));
        }

        public static IEnumerable<string> Or(this Dictionary<string, List<string>> index, string firstTerm, string secondTerm)
        {
            return (from d in index
                        where d.Key.Equals(firstTerm) || d.Key.Equals(secondTerm)
                        select d.Value).SelectMany(x=>x).Distinct();
        }

    }

    class EntryPoint
    {
        public static Dictionary<string, List<string>> invertedIndex;
 
        static void Main(string[] args)
        {
            invertedIndex = new Dictionary<string, List<string>>();
            string folder = "C:\\Users\\Elena\\Documents\\Visual Studio 2013\\Projects\\InvertedIndex\\Files\\";

            foreach (string file in Directory.EnumerateFiles(folder, "*.txt"))
            {
                List<string> content = System.IO.File.ReadAllText(file).Split(' ').Distinct().ToList();
                addToIndex(content, file.Replace(folder, ""));
            }

            var resAnd = invertedIndex.And("star", "sparkling");
            var resOr = invertedIndex.Or("star", "sparkling");

            Console.ReadLine();
        }

        private static void addToIndex(List<string> words, string document)
        {
            foreach (var word in words)
            {
                if (!invertedIndex.ContainsKey(word))
                {
                    invertedIndex.Add(word, new List<string> { document });
                }
                else
                {
                    invertedIndex[word].Add(document);
                }
            }
        }
    }
}

Enjoy and Happy New Year!

P.S. And here is the GitHub link: https://github.com/elena-sharova/Inverted-Index.git

Advertisements

Fast Binomial Option Pricing Model

Greetings to my blog readers!

In this post I will share with you a fast Binomial Option Pricer. I’ve built this pricer without using trees or other bulky data structures and this is what makes it very fast. At the moment, the pricer is capable of pricing Vanilla Options. However, it can be easily extended to price barrier or digital options. This model is not suitable for pricing path-dependent (e.g. Asian options) yet. I will work on designing a model for path-dependent options later on. We’ll begin by revisiting the idea of the Binomial Model. Then, I explain how I designed my pricer. The pricer is coded in C# and split into five program files. Each file can be downloaded via links provided below. Please note my disclaimer.

Binomial Pricing Model

The Binomial Pricing Model (BPM) has been around for ages. It is based on the idea of risk-neutral world where the value of a portfolio of derivatives can be replicated with a portfolio of its underlying and bonds. The BPM is a discrete model in a sense that it breaks-up the time-period between now and the maturity date into discrete time-steps. Then, for each time-step the underlying spot price is evolved until maturity. The terminal underlying prices are used to obtain the first sequence of derivative’s values. Then, the BPM works backwards to the starting point. On each time-step the previously-obtained values of the derivative are used to calculated the newer values using the risk-neutral pricing formula (i.e. the discounted expectation). This is repeated until the first time-step is reached.

Worked Example

The BPM requires a tree-like structure to hold simulated prices of the underlying and, finally, to hold intermediate option prices. Let’s look at an example. Suppose I need to price a vanilla call option with the following parameters: current spot = £41.75; strike = £45.0; bond yield/rate = 0.535%; spot volatility = 34%; time to maturity = 2 months. The BPM requires me to simulate the spot price for some period covering 2 months. It is up to me into how many time-steps I divide this 2-months period. Let’s say I divide it into 2 periods. Then I need to simulate something like this: BPMExample

So, S0 is the current spot, S1 and S2 are simulated for each corresponding time-node. Each node is a result of the spot price moving up or down. There are two BPM types: the multiplicative and the additive. In this post I am only showing you the multiplicative model, which means that we have up and down factors that are multiplied into the current spot price to obtain the new spot. For example, if up factor is 1.103 and down factor is 0.9065 then our simulated two-step tree will look like this: BPMExampleWithNumbers

where 46.05 was obtained by multiplying 41.75 into 1.103, and so on. The spot underlying prices at time-step 2 can be used to price the call option with the following payoff: Max(X-K, 0), where X is a given terminal spot and K is the specified strike (e.g. we have specified £45.0). So, in terms of call prices at time S2 we have £5.80 for up_up, £0.0 for up_down and £0.0 for down_down. We now take these call prices and work back to the starting point. At this point we need the risk-neutral option pricing formula. In the BPM it takes this form:

V=exp(-r\cdot \delta t)\cdot [p\cdot V_{u}+(1-p)\cdot V_{d}]

where r is the bond yield/rate, delta t is the single time-step expressed in times to maturity, p is the risk-neutral probability of the price of the underlying to increase at the next time-step, and Vu and Vd are the option prices we have just calculated. In practice, both, the probability of price increase and the up/down factors are dictated by the SDE model. The supplied code is configured to work for Jarrow&Rudd, Cox&Ross&Rubinstein and the BS cases. Once we have the time S1 option prices, we proceed to calculate the time S0 option price by using the same risk-neutral formula and the time S1 option prices.

Designing Binomial Option Model for Speed

As far as coding the BPM, we need to make decisions about how to represent the simulated spot prices, how to represent the intermediate option prices, and how to glue together components that do the following: simulate price of the underlying for various SDEs, represent option payoff and the workings of the binomial model itself. Ideally, a large number of simulations should be possible (e.g. N=1,000). Taking a binary tree to represent simulated prices is not feasible because it would require a tree with 2^N -1 nodes. A key to a fast BPM is to recognize three things:

  1. We don’t need to store intermediate spot prices, only the final spot prices.
  2. The binomial tree is a recombining tree due to its discrete nature.
  3. Two price paths that are different by a single time outcome (i.e. up or down) are adjacent to each other.

The adopted design can be imagined as the following:PathRepresentation

The multiplicative model lends itself to a use of powers. Thus a price path where spot is increasing on each step is calculated as S*u*u*u…*u, etc. Because the paths that differ by a single outcome are adjacent, we don’t need a fancy data structure to store terminal prices of the underlying. A plain array of doubles works just fine. Finally, to implement the binomial model, we utilize only two arrays. The first array is populated with the first option prices. The second array is used to calculate option prices for one step backwards. While the original is now storing the copies of the second array one value at a time.

The full BPM project implemented in C# can be accessed via these links:

Main
SDE
StockSimulator
BinomialPricer
Payoff

An UPDATE: a careful reader noted that the SDE link was wrong. The SDE file now lets you download the correct file. Also, a very useful suggestion was made by the same reader – to replace the inner body of the double for loop in BinomialPricer.GetPrice with the following assignment. Indeed, the time it takes to price the example vanilla option drops from 3,833 to 2,299 milliseconds on my PC in debug release. Thank you!

...
 for (int i = timeSteps; i>=0; i--)
{
      for (int j=0; j<nodes; j++)
      {
          //tempOptionPrices[j] = instDRate*(upProb*firstOptionPrices[j] + downProb*firstOptionPrices[j + 1]);
          //firstOptionPrices[j] = tempOptionPrices[j]; // reuse firstoptionprices array for next iteration
          firstOptionPrices[j] = instDRate * (upProb * firstOptionPrices[j] + downProb * firstOptionPrices[j + 1]);
       }
       nodes--;
}

       //return tempOptionPrices[0];
       return firstOptionPrices[0];
...

Executing a Python Script from a C# Program

Python is a great programming language. It also has fantastic libraries (e.g. numPy, sciPy, pandas, etc.) that can significantly simplify and speed-up development. So, it is no surprise that sometimes when considering a coding solution for some problem you scratch your head and wish if you could only do it in Python (i.e. instead of C# or other low level programming language)!

In this short post I will show you how to execute a Python scrip and capture its output from a C# console-based program. Essentially, running a Python script from C# can be done either with IronPython dll referenced within your C# project and using the IronPython.Hosting library; or via kicking off a process that executes a Python program. The advantage of the later is that the script can be written in any Python flavor, provided you have installed the Python shell compatible with the script.

For this example I am using the following Python program saved as ‘hello.py’:

#!/usr/bin/env

import sys

def main():
    print('Hello')
    if len(sys.argv) >=3:
        x = sys.argv[1]
        y = sys.argv[2]
        # print concatenated parameters
        print(x+y)


if __name__=='__main__':
    main()

The following C# code will start a process and assign it to the Python interpreter with the Python program name passed as argument. The output is parsed as stream. However, the output can be written to a file instead.

using System;
using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;

namespace ExecutingPythonScript
{
    class EntryPoint
    {
        static void Main(string[] args)
        {

            int x = 1;
            int y = 2;
            string progToRun = "C:\\Python27\\hello.py";
            char[] spliter = {'\r'};

            Process proc = new Process();
            proc.StartInfo.FileName = "python.exe";
            proc.StartInfo.RedirectStandardOutput = true;
            proc.StartInfo.UseShellExecute = false;
            
            // call hello.py to concatenate passed parameters
            proc.StartInfo.Arguments = string.Concat(progToRun, " ", x.ToString(), " ", y.ToString());
            proc.Start();

            StreamReader sReader = proc.StandardOutput;
            string[] output = sReader.ReadToEnd().Split(splitter);

            foreach (string s in output)
                Console.WriteLine(s);

            proc.WaitForExit();

            Console.ReadLine();
        }
    }
}

And that is all. It is that simple. One last thing to point out: progToRun could have been a string literal assigned to the actual Python code. This works well when you have a small task and don’t want to write a Python program to hold it.

Dependency Injection Patterns

Greetings to my blog readers!

This is a short post to review two dependency injection (DI) design patters: Constructor Injection and Property Injection. These two DIs are the easiest and the most commonly used patterns. There are other DI patters: Ambient Context and Method Injection. Personally I find them rather involved, and will skip them for now. Most of the background material for this post is gathered from Mark Seemann’s Dependency Injection in .Net. I highly recommend this book to anyone interested in DI patters.

Constructor Injection

Constructor injection is the most widely used DI pattern, and is the default patter for most of DI. It works by making all callers of a class to provide the dependency/ies as its public constructor parameter/s. The default or parameter-less constructor is never provided (remember, in C# if you provide your own constructor, the default one is scrapped). As you may have seen in my previous blog post, quite often the dependencies are supplied as interfaces. Here, any class that implements a particular interface can act as the required dependency. However, the passed dependency can be an instance of any object, so it does not have to be an interface.

If you examine the displayed code (i.e. not the pdf file code) from my previous blog on lines 15 through 20, you will see that I have added the guard clause to ensure the dependency is not null. This is an important point, since passing a null to a constructor expecting a reference or a nullable value type will compile, but won’t work. Also, marking the dependency as private and readonly, guarantees its integrity. The clear benefit of constructor DI is the separation of responsibilities, as well as improved code maintainability. If the dependency is meant to be optional, then the property injection could be used instead.

Property Injection

Property injection DI pattern (also known as a Setter Injection) is closely related to the constructor injection. This pattern is used when the dependency in a class is optional, and there may be a local to the class default that can be used instead. If the calling class needs to override the default behavior, it provides its own dependency by setting a public or an internal property. The default behavior is specified in the constructor of the dependent class (i.e. BasicInterpolation.ConsoleWriter). However, to decouple it Output class from interpolation code, I could have added a console-based subclass to the Output class itself.

Below is an example of a property injection DI pattern. I have changed the UserOutput.Output class from the interpolation code to now use this pattern. Note that I am keeping the guard clause to ensure the passed dependency is not null. However, I can no longer have the property as readonly. Also, we allow the property to be set only once:

namespace UserOutput
{
    #region GENERIC_OUTPUT
    internal interface IWriter
    {
        void Write(string message);
    }

    internal class Output
    {
        private IWriter outputwriter;

        public IWriter writer 
        {
            get
            {
                if (this.outputwriter == null)
                {
                    // provide with a default writer
                    this.outputwriter = new BasicInterpolation.ConsoleWriter();
                }
                    return this.outputwriter;
                
            }
            
            set
            {
              if (value == null)
              {
                  throw new ArgumentNullException("Null writer interface.");
              }

              if (this.outputwriter!=null)
              {
                  // writer can only be set once
                  throw new InvalidOperationException();
              }

              this.outputwriter = value;
            }
        }

    }
    #endregion
}

Conclusion

In the Design Patters world, the constructor injection is the simplest pattern. In simplicity, it is followed by the property injection pattern. The latter can be used when a default behavior is required.

Abstract Class vs. Interface and when to Use Both

Greetings to my blog readers!

I have recently come across a puzzling code example in one book. Here is a snapshot of the code:

public interface ITwoFactorPayoff
{ // The unambiguous specification of two-factor payoff contracts

	 // This interface can be specialized to various underlyings
	 double payoff(double factorI1, double factorII);
}
public abstract class MultiAssetPayoffStrategy: ITwoFactorPayoff
{

	 public abstract double payoff(double S1, double S2);
}

Can you see why I was puzzled? The web is full of advice for when to choose abstract class vs. interface and the other way around. Take for example, this MSDN link or this stackoverflow Q&A. Anyone who is not lazy has written something about this already. However, a discussion on when one would want to use both is somewhat lacking. And rightly so, you may say. The above code introduces a redundancy with no apparent benefit. Here, inheriting from MultiAssetPayoffStrategy class still requires to implement the payoff method. So, why not just use the interface? I would agree with you that the example is poorly designed (in spite of the fact that the book it comes from is an expensive one!). Nevertheless, it got me thinking – when would one actually want to have abstract classes implementing interfaces. Here are the reasons I have come up with. Would be interested to know if you can add anything to this.

Reason 1 – Tidy Code

This is the simplest reason, and the first one that came to my mind. Imagine you have 10 or more interfaces that, a class hierarchy you are working on requires to implement. That is, class A must implement interface1, interface2, …, interfaceN; class B also must implement these interfaces. In other words, a situation that looks like this:


public interface Interface1
{    
    void method_interface1();
}

public interface Interface2
{
   void method_interface2();
}
...
public class A: Interface1, Interface2
{
   void method_interface1() {...}
   void method_interface2() {...}
   ...
   
}

public class B: Interface1, Interface2
{
   void method_interface1() {...}
   void method_interface2() {...}
   ...
   
}

A way to simplify the code is to introduce either another interface that inherits from all 10 interfaces or simply an abstract class that does the same. Then, class A and class B just need to inherit from a single abstract class or a single interface.

Reason 2 – Providing Default Implementation for Some but Not Other Members

An abstract class could provide a default behavior for some methods. I admit, this is different to the book’s example since that method is an abstract one. But we all know that an abstract class can have non-abstract methods. The derived class then has an option on which methods it needs to implement. Here is a simple example (note that default implementation is decorated as virtual):

namespace OOPDesign
{
    public interface IPrice
    {
        double Price(double strike);
    }

    public interface IPrintPrice
    {
        void PrintPrice(double price);
    }

    public abstract class Pricer: IPrice, IPrintPrice
    {
        public abstract double Price(double strike);

        public virtual void PrintPrice(double price)
        {
            Console.WriteLine(price);
        }
    }


    internal sealed class OptionPricer: Pricer
    {
        public override double Price(double strike)
        {
            return strike + 15;
        }

    }

    class EntryPoint
    {
        static void Main(string[] args)
        {
            double strike = 20;
            OptionPricer op = new OptionPricer();
            op.PrintPrice(op.Price(strike));
            Console.ReadLine();
        }
    }
}

Reason 3 – Combining Class Hierarchy with Contracts

This is probably the most obvious reason for choosing abstract classes together with interfaces. An abstract class can serve as a design summary of what all derived classes should do. This can aid in implementing a well-known visitor pattern. To give some credit to the mentioned book, it eventually evolves the example code to be based on this design pattern. If you are not familiar with the visitor pattern, I suggest you google it. Below I am providing a C# implementation based on the Java example one can find in Wikipedia (although it is not a one-for-one translation, as I have simplified the code a bit). This visitor pattern example can be summed-up as follows. We start off with three contracts: 2 interfaces and one abstract class. The abstract class is here to tell all its concrete classes to implement one of the interfaces (the accept method). The other interface is for the “visiting” classes. These visiting classes don’t really know what they are visiting, as long as the concrete class can accept them (which is guaranteed since the concrete classes are of the parent type and implement the accept functionality.

namespace VisitorExample
{
    public interface ICarElementVisitor 
    {
        void visit(CarBodyPart part);
    }

    public interface ICarElement 
    {
        void accept(ICarElementVisitor visitor);
    }

    public abstract class CarBodyPart: ICarElement
    {
        public abstract void accept(ICarElementVisitor visitor);
        public abstract string Name { get;  }
        public abstract string Action { get;  }
    }

    public sealed class Wheel: CarBodyPart
    {
        private string name;
        private string action;

        public Wheel(string name, string action) 
        {
            this.name = name;
            this.action = action;
        }

        public override void accept(ICarElementVisitor visitor) 
        {
            visitor.visit(this);
        }

        public override string Name
        {
            get
            {
                return this.name; ;
            }
        }

        public override string Action
        {
            get
            {
                return this.action; ;
            }
        }
   
    }

    public sealed class Engine : CarBodyPart
    {

        private string name;
        private string action;

        public Engine(string name, string action) 
        {
            this.name = name;
            this.action = action;
        }

        public override void accept(ICarElementVisitor visitor) 
        {
            visitor.visit(this);
        }

        public override string Name
        {
            get
            {
                return this.name; ;
            }
        }

        public override string Action
        {
            get
            {
                return this.action; ;
            }
        }
    }

    public sealed class Body : CarBodyPart
    {

        private string name;
        private string action;

        public Body(string name, string action) 
        {
            this.name = name;
            this.action = action;
        }

        public override void accept(ICarElementVisitor visitor) 
        {
            visitor.visit(this);
        }

        public override string Name
        {
            get
            {
                return this.name; ;
            }
        }

        public override string Action
        {
            get
            {
                return this.action; ;
            }
        }
    }

    public sealed class Car : CarBodyPart
    {
        ICarElement[] elements;
        private string name;
        private string action;

        public Car() 
        {
            this.name = "car";
            this.action = "driving ";

            this.elements = new ICarElement[] 
            {   new Wheel("front left wheel", "Kicking my "),
                new Wheel("front right wheel", "Kicking my "), 
                new Wheel("back left wheel", "Kicking my ") ,
                new Wheel("back right wheel", "Kicking my "), 
                new Body("Body", "Moving my "), 
                new Engine("Engine", "Starting my ") 
            };
        }

        public override void accept(ICarElementVisitor visitor) 
        {    
            foreach(ICarElement elem in elements) 
            {
                elem.accept(visitor);
            }
            visitor.visit(this);    
        }

        public override string Name
        {
            get
            {
                return this.name; ;
            }
        }

        public override string Action
        {
            get
            {
                return this.action; ;
            }
        }
    }

    sealed class CarElementPrintVisitor:ICarElementVisitor 
    {
        public void visit(CarBodyPart part)
        {
            Console.WriteLine("Visiting " + part.Name);
        }
    }

    sealed class CarElementActionVisitor:ICarElementVisitor 
    {
        public void visit(CarBodyPart part)
        {
            Console.WriteLine(part.Action + part.Name);
        }
    }

    class EntryPoint
    {
        static void Main(string[] args)
        {
            ICarElement car = new Car();
            car.accept(new CarElementPrintVisitor());
            car.accept(new CarElementActionVisitor());
            Console.ReadLine();
        }
    }
}

Conclusion

Sometimes you end up accepting what you never thought you could accept…