# Inverted Index in C#

This is my last post for 2015. All in all 2015 has been a good year. Many people from all over the world have read my posts, tried my code. Some even subscribed…

In this post I will do something rather simple – write an inverted index builder in C#. What is an inverted index? It is the simplest form of document indexing to allow for performing boolean queries on text data. For example, imagine we have several text files and we would like to find out which of them contain all or some of the search terms. We can group search terms using regular boolean AND and OR expressions.

To illustrate, let’s say we have two files, 1.xt and 2.txt. The two files contains the following:

file 1: one star is sparkling bright
file 2: two stars are sparkling even brighter

An inverted index on such data is built in four steps:

1. Collect the documents.
2. Tokenize the text by parsing each element.
3. Perform linguistic pre-processing (e.g. removing punctuation, removing capitalization, etc.) to normalize the elements.
4. Create an index in a form of a dictionary, where keys are the elements and the values are the documents that contain them.

It is really as simple as that. I am not going to code the normalization step as this would remove the main emphasis of this post. Instead, I will provide two simple extensions to allow for an AND and OR queries on the index.

So, given our short example files, how does the inverted index look like? Here is a visual example:

I use C# Dictionary to store such an index. The AND and OR are provided as extensions, and for a moment can handle only two keywords. It would be simple to remove this restriction by passing an array or a list to the extensions methods instead. In the end, the code is just an example of using LINQ (Intersect, Union and SelectMany). If you end up using this code, please add your own exceptions handling.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace InvertedIndex
{

public static class Extensions
{
public static IEnumerable<string> And (this Dictionary<string, List<string>> index, string firstTerm, string secondTerm)
{
return(from d in index
where d.Key.Equals(firstTerm)
select d.Value).SelectMany(x=>x).Intersect
((from d in index
where d.Key.Equals(secondTerm)
select d.Value).SelectMany(x => x));
}

public static IEnumerable<string> Or(this Dictionary<string, List<string>> index, string firstTerm, string secondTerm)
{
return (from d in index
where d.Key.Equals(firstTerm) || d.Key.Equals(secondTerm)
select d.Value).SelectMany(x=>x).Distinct();
}

}

class EntryPoint
{
public static Dictionary<string, List<string>> invertedIndex;

static void Main(string[] args)
{
invertedIndex = new Dictionary<string, List<string>>();
string folder = "C:\\Users\\Elena\\Documents\\Visual Studio 2013\\Projects\\InvertedIndex\\Files\\";

foreach (string file in Directory.EnumerateFiles(folder, "*.txt"))
{
}

var resAnd = invertedIndex.And("star", "sparkling");
var resOr = invertedIndex.Or("star", "sparkling");

}

private static void addToIndex(List<string> words, string document)
{
foreach (var word in words)
{
if (!invertedIndex.ContainsKey(word))
{
invertedIndex.Add(word, new List<string> { document });
}
else
{
}
}
}
}
}



Enjoy and Happy New Year!

P.S. And here is the GitHub link: https://github.com/elena-sharova/Inverted-Index.git

# Fast Binomial Option Pricing Model

In this post I will share with you a fast Binomial Option Pricer. I’ve built this pricer without using trees or other bulky data structures and this is what makes it very fast. At the moment, the pricer is capable of pricing Vanilla Options. However, it can be easily extended to price barrier or digital options. This model is not suitable for pricing path-dependent (e.g. Asian options) yet. I will work on designing a model for path-dependent options later on. We’ll begin by revisiting the idea of the Binomial Model. Then, I explain how I designed my pricer. The pricer is coded in C# and split into five program files. Each file can be downloaded via links provided below. Please note my disclaimer.

#### Binomial Pricing Model

The Binomial Pricing Model (BPM) has been around for ages. It is based on the idea of risk-neutral world where the value of a portfolio of derivatives can be replicated with a portfolio of its underlying and bonds. The BPM is a discrete model in a sense that it breaks-up the time-period between now and the maturity date into discrete time-steps. Then, for each time-step the underlying spot price is evolved until maturity. The terminal underlying prices are used to obtain the first sequence of derivative’s values. Then, the BPM works backwards to the starting point. On each time-step the previously-obtained values of the derivative are used to calculated the newer values using the risk-neutral pricing formula (i.e. the discounted expectation). This is repeated until the first time-step is reached.

#### Worked Example

The BPM requires a tree-like structure to hold simulated prices of the underlying and, finally, to hold intermediate option prices. Let’s look at an example. Suppose I need to price a vanilla call option with the following parameters: current spot = £41.75; strike = £45.0; bond yield/rate = 0.535%; spot volatility = 34%; time to maturity = 2 months. The BPM requires me to simulate the spot price for some period covering 2 months. It is up to me into how many time-steps I divide this 2-months period. Let’s say I divide it into 2 periods. Then I need to simulate something like this:

So, S0 is the current spot, S1 and S2 are simulated for each corresponding time-node. Each node is a result of the spot price moving up or down. There are two BPM types: the multiplicative and the additive. In this post I am only showing you the multiplicative model, which means that we have up and down factors that are multiplied into the current spot price to obtain the new spot. For example, if up factor is 1.103 and down factor is 0.9065 then our simulated two-step tree will look like this:

where 46.05 was obtained by multiplying 41.75 into 1.103, and so on. The spot underlying prices at time-step 2 can be used to price the call option with the following payoff: Max(X-K, 0), where X is a given terminal spot and K is the specified strike (e.g. we have specified £45.0). So, in terms of call prices at time S2 we have £5.80 for up_up, £0.0 for up_down and £0.0 for down_down. We now take these call prices and work back to the starting point. At this point we need the risk-neutral option pricing formula. In the BPM it takes this form:

$V=exp(-r\cdot \delta t)\cdot [p\cdot V_{u}+(1-p)\cdot V_{d}]$

where r is the bond yield/rate, delta t is the single time-step expressed in times to maturity, p is the risk-neutral probability of the price of the underlying to increase at the next time-step, and Vu and Vd are the option prices we have just calculated. In practice, both, the probability of price increase and the up/down factors are dictated by the SDE model. The supplied code is configured to work for Jarrow&Rudd, Cox&Ross&Rubinstein and the BS cases. Once we have the time S1 option prices, we proceed to calculate the time S0 option price by using the same risk-neutral formula and the time S1 option prices.

#### Designing Binomial Option Model for Speed

As far as coding the BPM, we need to make decisions about how to represent the simulated spot prices, how to represent the intermediate option prices, and how to glue together components that do the following: simulate price of the underlying for various SDEs, represent option payoff and the workings of the binomial model itself. Ideally, a large number of simulations should be possible (e.g. N=1,000). Taking a binary tree to represent simulated prices is not feasible because it would require a tree with 2^N -1 nodes. A key to a fast BPM is to recognize three things:

1. We don’t need to store intermediate spot prices, only the final spot prices.
2. The binomial tree is a recombining tree due to its discrete nature.
3. Two price paths that are different by a single time outcome (i.e. up or down) are adjacent to each other.

The adopted design can be imagined as the following:

The multiplicative model lends itself to a use of powers. Thus a price path where spot is increasing on each step is calculated as S*u*u*u…*u, etc. Because the paths that differ by a single outcome are adjacent, we don’t need a fancy data structure to store terminal prices of the underlying. A plain array of doubles works just fine. Finally, to implement the binomial model, we utilize only two arrays. The first array is populated with the first option prices. The second array is used to calculate option prices for one step backwards. While the original is now storing the copies of the second array one value at a time.

The full BPM project implemented in C# can be accessed via these links:

An UPDATE: a careful reader noted that the SDE link was wrong. The SDE file now lets you download the correct file. Also, a very useful suggestion was made by the same reader – to replace the inner body of the double for loop in BinomialPricer.GetPrice with the following assignment. Indeed, the time it takes to price the example vanilla option drops from 3,833 to 2,299 milliseconds on my PC in debug release. Thank you!

...
for (int i = timeSteps; i>=0; i--)
{
for (int j=0; j<nodes; j++)
{
//tempOptionPrices[j] = instDRate*(upProb*firstOptionPrices[j] + downProb*firstOptionPrices[j + 1]);
//firstOptionPrices[j] = tempOptionPrices[j]; // reuse firstoptionprices array for next iteration
firstOptionPrices[j] = instDRate * (upProb * firstOptionPrices[j] + downProb * firstOptionPrices[j + 1]);
}
nodes--;
}

//return tempOptionPrices[0];
return firstOptionPrices[0];
...


# Executing a Python Script from a C# Program

Python is a great programming language. It also has fantastic libraries (e.g. numPy, sciPy, pandas, etc.) that can significantly simplify and speed-up development. So, it is no surprise that sometimes when considering a coding solution for some problem you scratch your head and wish if you could only do it in Python (i.e. instead of C# or other low level programming language)!

In this short post I will show you how to execute a Python scrip and capture its output from a C# console-based program. Essentially, running a Python script from C# can be done either with IronPython dll referenced within your C# project and using the IronPython.Hosting library; or via kicking off a process that executes a Python program. The advantage of the later is that the script can be written in any Python flavor, provided you have installed the Python shell compatible with the script.

For this example I am using the following Python program saved as ‘hello.py’:

#!/usr/bin/env</font>
<font size="3">

import sys

def main():
print('Hello')
if len(sys.argv) &gt;=3:
x = sys.argv[1]
y = sys.argv[2]
# print concatenated parameters
print(x+y)
</font>
<font size="3">if __name__=='__main__':
main()


The following C# code will start a process and assign it to the Python interpreter with the Python program name passed as argument. The output is parsed as stream. However, the output can be written to a file instead.

using System;
using System.Diagnostics;
using System.IO;
<font size="3">

namespace ExecutingPythonScript
{
class EntryPoint
{
static void Main(string[] args)
{

int x = 1;
int y = 2;
string progToRun = "C:\\Python27\\hello.py";
char[] splitter = {'\r'};

Process proc = new Process();
proc.StartInfo.FileName = "python.exe";
proc.StartInfo.RedirectStandardOutput = true;
proc.StartInfo.UseShellExecute = false;

// call hello.py to concatenate passed parameters
proc.StartInfo.Arguments = string.Concat(progToRun, " ", x.ToString(), " ", y.ToString());
proc.Start();

foreach (string s in output)
Console.WriteLine(s);

proc.WaitForExit();
</font>
}
}
}


And that is all. It is that simple. One last thing to point out: progToRun could have been a string literal assigned to the actual Python code. This works well when you have a small task and don’t want to write a Python program to hold it.

***Thanks to Johannes for spotting a coding type in the variable name, which is now fixed.

# Dependency Injection Patterns

This is a short post to review two dependency injection (DI) design patters: Constructor Injection and Property Injection. These two DIs are the easiest and the most commonly used patterns. There are other DI patters: Ambient Context and Method Injection. Personally I find them rather involved, and will skip them for now. Most of the background material for this post is gathered from Mark Seemann’s Dependency Injection in .Net. I highly recommend this book to anyone interested in DI patters.

#### Constructor Injection

Constructor injection is the most widely used DI pattern, and is the default patter for most of DI. It works by making all callers of a class to provide the dependency/ies as its public constructor parameter/s. The default or parameter-less constructor is never provided (remember, in C# if you provide your own constructor, the default one is scrapped). As you may have seen in my previous blog post, quite often the dependencies are supplied as interfaces. Here, any class that implements a particular interface can act as the required dependency. However, the passed dependency can be an instance of any object, so it does not have to be an interface.

If you examine the displayed code (i.e. not the pdf file code) from my previous blog on lines 15 through 20, you will see that I have added the guard clause to ensure the dependency is not null. This is an important point, since passing a null to a constructor expecting a reference or a nullable value type will compile, but won’t work. Also, marking the dependency as private and readonly, guarantees its integrity. The clear benefit of constructor DI is the separation of responsibilities, as well as improved code maintainability. If the dependency is meant to be optional, then the property injection could be used instead.

#### Property Injection

Property injection DI pattern (also known as a Setter Injection) is closely related to the constructor injection. This pattern is used when the dependency in a class is optional, and there may be a local to the class default that can be used instead. If the calling class needs to override the default behavior, it provides its own dependency by setting a public or an internal property. The default behavior is specified in the constructor of the dependent class (i.e. BasicInterpolation.ConsoleWriter). However, to decouple it Output class from interpolation code, I could have added a console-based subclass to the Output class itself.

Below is an example of a property injection DI pattern. I have changed the UserOutput.Output class from the interpolation code to now use this pattern. Note that I am keeping the guard clause to ensure the passed dependency is not null. However, I can no longer have the property as readonly. Also, we allow the property to be set only once:

namespace UserOutput
{
#region GENERIC_OUTPUT
internal interface IWriter
{
void Write(string message);
}

internal class Output
{
private IWriter outputwriter;

public IWriter writer
{
get
{
if (this.outputwriter == null)
{
// provide with a default writer
this.outputwriter = new BasicInterpolation.ConsoleWriter();
}
return this.outputwriter;

}

set
{
if (value == null)
{
throw new ArgumentNullException("Null writer interface.");
}

if (this.outputwriter!=null)
{
// writer can only be set once
throw new InvalidOperationException();
}

this.outputwriter = value;
}
}

}
#endregion
}


#### Conclusion

In the Design Patters world, the constructor injection is the simplest pattern. In simplicity, it is followed by the property injection pattern. The latter can be used when a default behavior is required.

# Computing Bond Yield with Newton’s Method

In fixed income analysis it is often required to calculate the yield of some coupon paying instrument, for example a bond. Very simply, an yield of a bond is defined as a single rate of return. Under continuous compounding, given a set of interest rates $\{r_{T_1}, r_{T_2}, ..., r_{T_n}\}$, the yield $y$ equates the present value of the bond computed under a given set of rates. In other words:

$\sum_{i=1}^{n}exp(-r_{T_i}T_i) C_{T_i}=\sum_{i=1}^{n}exp(-yT_i) C_{T_i}$

where $C_{T_i}$ is a coupon paid out at time $T_i$. Usually, one solves for $y$ using some iterative guess-work, e.g. goal seek. In this post I will show you a well-know Newton’s recursive method that can be used to base such goal seek on.

### The Newton’s Method

The Newton’s method is based on a recursive approximation formula:

$x_{i+1}=x_i-\frac{f(x_i)}{f'(x_i)}$

Let $B=\sum_{i=1}^{n}exp(-yT_i) C_{T_i}$ be the price(or present value) of the bond expressed via its yield. Then, taking $x$ to represent interest rate, we have:
$f(x) = \sum_{i=1}^{n}exp(-x_{T_i}T_i) C_{T_i} - B = 0$ and $f'(x)=-\sum_{i=1}^{n}T_i C_{T_i} exp (-x_{T_i}T_i)$

The recursive formula to solve for yield becomes:

$x_{i+1}=x_i+\frac{\sum_{i=1}^{n}exp(-x_{T_i}T_i) C_{T_i} - B}{\sum_{i=1}^{n}T_i C_{T_i} exp (-x_{T_i}T_i)}$

### Goal Seek in C#

Now let’s take a look at an example implementation. The while loop in CalculateYield method implements the goal seek. This is a very simple example, and we pretend to have the bond price.

using System;
using System.Collections.Generic;
using System.Linq;

namespace YieldCalculator
{
class EntryPoint
{
static void Main(string[] args)
{
// use some numbers to illustrate the point
// maturities in months
int[] maturities = { 6, 12, 18, 24, 30, 36 };
double[] yearFrac = maturities.Select(i => i / 12.0).ToArray();
// bond principle
double P = 100;
// bond price
double B = 108;
double couponRate = 0.1;
// calculate future cashflows
double[] cashflows = maturities.Select(i => (P * couponRate) / 2.0).ToArray();
cashflows[cashflows.Count() - 1] = P + cashflows[cashflows.Count() - 1];

double initialGuess = 0.1;
Console.WriteLine(CalculateYield(initialGuess, cashflows, yearFrac, B));
}

static double CalculateYield(double initialGuess, double[] cashflows, double[] yearFrac, double B)
{
double error = 0.000000001;
double x_i = initialGuess-1.0;
double x_i_next = initialGuess;

double numerator;
double denominator;

while (Math.Abs(x_i_next - x_i) > error)
{
x_i = x_i_next;
// linq's Zip is handy to perform a sum over expressions involving several arrays
numerator = cashflows.Zip(yearFrac, (x,y) => (x * Math.Exp(y *-1*x_i))).Sum();
denominator = yearFrac.Zip(cashflows, (x,y) => (x*y*Math.Exp(x*-1*x_i))).Sum();
x_i_next = x_i + (numerator - B) / denominator;
}
return x_i_next;
}
}
}


Note how handy is LINQ’s Zip function to calculate a sum over an expression involving two arrays.

And now to the burning question – why the picture of a cat? The answer is even simpler than this post: It is a cute cat, so, why not?