Jun 16, 2009

How to compile ruby from source on Windows?

I wanted to play with the latest version of ruby. I got the win32 binary zip of 1.9.1 p0 - however irb used to popup an error message on startup - used it for sometime but then I decided to fix it. Checked back on ruby's download page and found an updated version p129. Downloaded it but it was a source zip.

I wasn't sure that compiling it from source would be trivial... however it reinforced the simplicity of the ruby way. Summarizing the steps I found on this post at ruby-forum.com

  1. Get the source zip from the download page and unzip it locally to say c:\ruby-src
  2. Open the Visual Studio Command prompt (I have VS2008 on WinXP SP2)
  3. Make a folder for your build say c:\ruby-build. Move into it / change dir
  4. c:\ruby-build> c:\ruby-src\ruby-1.9.1-p129\win32\configure.bat
  5. c:\ruby-build> nmake
  6. c:\ruby-build> nmake test
  7. c:\ruby-build> nmake DESTDIR=c:/ruby-build install
That's it.. you'll find the binaries within a usr folder within the ruby-build folder.

Jun 15, 2009

How to create a global (system) windows hook ?

What are Windows Hooks? You probably don't want to know that :) Good starting points are
The MSDN door to hooks
Win32 Hooks


A global / system hook would be “I want to be called when anyone does X in any process” ; the other kind is when “I want to be called when anyone does X in ThreadID#Y”. Creating a global windows hook cannot be done in managed code except (some keyboard/mouse hooks) via P/Invoke (so save yourself some time although local hooks are possible).
Windows would callback on the Hook function ; also called the filter function. The signature -


LRESULT CALLBACK FilterFunctionForHook(int code, WPARAM wParam, LPARAM lParam)

For global hooks, the filter function should be contained in an unmanaged DLL – I created one called WinHookFacade.dll. This unmanaged DLL would be injected into each process. Also the copy of the DLL loaded into other process cannot be explicitly unloaded – will unload when the process goes down. Also any bad code in the hook function can result in bad scary things and may require the user to go through a logoff-login or reboot cycle. Avoid if possible because global hooks will extract their pound of performance. But when has that stopped us programmers ? ;) Hence exercise caution with global hooks.

To create a global hook, you therefore need to create an exe project that hooks/unhooks the filter function and a dll project that contains the filter function.

The Hook




void Hook()
{
HINSTANCE hCurrentDll = GetModuleHandle(_T("WinHookFacade.dll"));
g_HookHandle = SetWindowsHookEx(WH_CBT,
FilterFunctionForHook,
hCurrentDll,
0);

if (g_HookHandle == NULL)
throw new std::exception("Unable to hook");
}


The key here would be the SetWindowsHookEx Win32 API function. The first parameter is what type of hook you’d like to register for – See this page for available types. In this example, WH_CBT means events related to windows (creation,activation,destructions etc). The second parameter is the name of the Hook/Filter function that shall be called back by the OS – detailed below. The third parameter is a handle to the DLL containing the hook function for global hooks (if null, it indicates a thread-specific hook). The fourth parameter is 0 for global hooks (for thread/local hooks, it is the Thread ID you want to hook into).

The Callback




LRESULT CALLBACK FilterFunctionForHook(int code, WPARAM wParam, LPARAM lParam)
{
if (code >= 0)
{
//your code e.g.
switch(code)
{
case HCBT_ACTIVATE:
// log the window title
...
}
return CallNextHookEx(g_HookHandle, code, wParam, lParam);
}


This function would be called in the process in which the event occurs (e.g. where a window is created). Process Isolation : a process cannot run custom code into another process; hence DLL injection. Also the CallNextHookEx Win32 API function must be called to pass control onto the next filter function in queue passing it the original arguments received to be a good citizen. The first parameter is the Hook Handle obtained when you registered the Hook function.

IMPORTANT: However this value must be the same across all instances of the shared Hook DLL & must be available within the filter function. For this reason, you need to create the “hook handle” in a shared data segment Whaa.. how do I do that? Declare your HHOOK variable like this


#pragma data_seg (".MY_HOOK_DATA")
HHOOK g_HookHandle = 0;
#pragma data_seg()


This function is supposed to call the next function in queue without doing anything if the code argument is less than zero. If not, the function may execute custom code to do specific tasks. The wParam and lParam arguments are pointers to handles/structures containing additional info. The exact type of structure depends on the value of the code parameter – check msdn for the page for the specific alias of the filterfunction e.g. in my case it would be CBTPROC; wParam is a handle to the window and lParam is a pointer to a struct

Cleanup



void Unhook()
{
if (!UnhookWindowsHookEx(g_HookHandle))
throw new std::exception("Unhook failed!");

g_HookHandle = NULL;
}


Here the primary player is the UnhookWindowsHookEx Win32 API Function. In case of an error, you’d need to use the fugly GetLastError() function to know what really went wrong.

Test Drive


Now to take this for a spin, create an exe project referencing the DLL containing the hook function. The main function will call on the exported functions as shown below..

int _tmain(int argc, _TCHAR* argv[])
{
char buffer[256];
try
{
printf("Hooking...");
Hook();
printf("Done.\n");


printf("Press Enter to exit\n"); // hook is running
gets(buffer);

printf("Unhooking...");
Unhook();
printf("Done. Woohoo!!\n");





That’s all there is to it.
CAUTION: Also if you get something wrong, the effects would be felt immediately – in my case, the UI would freeze as soon as my botched hook function was registered; my taskbar would be nuked and random error dialogs from running apps. But nothing a reboot can’t solve! C++ string compiler errors, WinAPI gotchas & hooks -- It was like walking on a tightrope in dark windy conditions. <runs back to the managed side>

Apr 22, 2009

Ruby is slow... until you learn to write good code.

I wrote up a small program to find prime-numbers between x and y ; implemented the Sieve of Eratosthenes.
The SOE is basically where you take all numbers (say 1-100) into an array. Cross out 1. Loop from 2-Sqrt(N) i.e. 2-10. Now cross out every multiple of 2 in the array. Next cross out every multiple of 3.. and so on. Finally all the numbers that aren't crossed out are prime.

However I did a very simplistic implementation (test-driven) to boot. All the tests did pass.. however running it to get all primes between 10K and 100K took 34 secs to execute. And I was on the new n improved ruby 1.9 with a better VM. I switched to the stable ruby 1.8.6 and it took ~60 secs. So now I turned back to my primary Weapon C#. The same steps implemented in C# took 722 msecs. So now there were 2 possibilities
1. Ruby is sloooow (the usual gripe)
2. Something in my algorithm is messed up.

So I posted my source onto StackOverflow - (my current fave time sink :) to get some eyeballs to look at the problem.
So here's my naive implementation


class PrimeGenerator
def self.get_primes_between( x, y)
sieve_array = Array.new(y) {|index|
(index == 0 ? 0 : index+1)
}

position_when_we_can_stop_checking = Math.sqrt(y).to_i
(2..position_when_we_can_stop_checking).each{|factor|
sieve_array[(factor).. (y-1)].each{|number|
sieve_array[number-1] = 0 if isMultipleOf(number, factor)
}
}

sieve_array.select{|element|
( (element != 0) && ( (x..y).include? element) )
}
end
def self.isMultipleOf(x, y)
return (x % y) == 0
end
end


# Benchmarking code

require 'benchmark'
Benchmark.bm(30) do |r|
r.report("Gishu") { a = PrimeGenerator.get_primes_between( 1_000, 10_000) }
end



A few people got back with what might be the problem.
One of the first and most voted issue was the slicing the array to get a new subset array. That's got to be expensive inside a loop for an array of this size. Obviously I was using for loops in C# (since C# doesn't have Ruby's each)


for index in factor..y-1 do
sieve_array[index] = 0 if isMultipleOf(index+1, factor)
end


That shaved off 8 secs - 24%. But still 26 secs. Another optimization was to reduce iterations - instead of checking each number with IsMultiple(number, _loopVar), set the 'step' for the loop to equal _loopVar (so from 2 you go 4,6,8,10...)

And finally Mike Woodhouse dispatched a homer. He posted a tiny snippet that did the same thing in under 1 sec. No that's not a typo.


def sieve_to(n)
s = (0..n).to_a
s[0]=s[1]=nil
s.each do |p|
next unless p
break if p * p > n
(p*p).step(n, p) { |m| s[m] = nil }
end
s.compact
end

# Usage:
# puts sieve_to(11).select{|x| x>=3}.inspect



user system total real
Gishu 27.219000 0.000000 27.219000 ( 27.375000)
Mike 0.141000 0.000000 0.141000 ( 0.140625)

I then had to know how he did it..

  1. Smart#1: He skips if array[_loopvar] is already 0. whereas if array[6] is crossed out due to 3, I did a scanning run to set all multiples of 6 to 0 which were already 0.(now down to 6 secs)

  2. Smart#2: Use the Numeric#step trick to jump to the next multiple vs traversing entire list with an IsMultiple check (now down to 0.25 secs)

  3. Smart#3: He begins the inner loop with Square of p. e.g if _loopvar is 7, I used to check from 8->end. Mike checks from 49->end. This one was a little tricky to figure out.. the reason is 7x2, 7x3, 7x4, 7x5, 7x6 have already been crossed out when _loopVar was 2,3,4,5,6 respectively. (now down to 0.21 secs)

  4. Smart#4: Smart use of Ruby's nil for crossed out vs my choice of 0. That followed by Array#compact reduces the number of elements for the final select operation to get primes between x and y (now down to 0.18 secs.)



So after all that it's Ruby 0.15 secs and C# (with same optimizations) 0.028 secs, much more easy to take in.

An interesting thought to end this post is that even with the naive first implementation C# performed reasonably well... under a sec. I'd have moved on to the next task. Ruby on the other hand slowed to a crawl forcing me to take another look for optimizations. In a indirect sort of way, C#'s superior performance could be hiding bad/substandard solutions and letting programmers get away with it...

Apr 7, 2009

MergeSort in Ruby

Finally got the ball rolling on the 'Introduction to Algorithms' book.. What I figured was that I implement the sorting algorithms in ruby - help me get better at ruby AND algorithms.
I've got to confess.. I'm test-infected so I'm gonna TDD my way out. This series of posts are not going to be in a walkthrough style of narration.. I'm just going to post complete samples. The order of methods indicate how the code was built up.. (the first method is the 'oldest'.

Here's merge_sort_test.rb [The Test Fixture]
--

require 'test/unit'
require 'merge_sort'
class TestMergeSort < Test::Unit::TestCase
def test_partition_array_with_even_number_of_elements
left, right = MergeSort.partition_array [11, 5, 9, 15]
assert_equal [11, 5], left
assert_equal [9, 15], right
end
def test_partition_array_with_odd_number_of_elements
left, right = MergeSort.partition_array [11, 5, 45, 9, 15]
assert_equal [11, 5], left
assert_equal [45, 9, 15], right
end

def test_merge_arrays
assert_equal( [10,20,30,40],
MergeSort.merge_arrays( [10,30], [20,40] ) )

assert_equal( [1, 3, 5, 9],
MergeSort.merge_arrays( [1, 5, 9], [3] ) )
assert_equal( [1, 3, 5, 9],
MergeSort.merge_arrays( [3], [1,5,9] ) )
end

def test_sort
assert_equal [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
MergeSort.sort([ 8, 3, 4, 0, 9, 1, 2, 7, 6, 5])
end
end


--
merge_sort.rb [The Implementation]
--
class MergeSort 
def self.partition_array( input_array)
mid_index = (input_array.length / 2) - 1
[input_array[0..mid_index], input_array[(mid_index+1)..(input_array.length-1)]]
end

def self.merge_arrays(array1, array2)
merged_array = []

until (array1.empty? || array2.empty?) do
if (array1[0] < array2[0])
merged_array << array1.shift
else
merged_array << array2.shift
end
end

merged_array + array1 + array2
end

def self.sort( input_array )
return input_array if (input_array.length < 2)

left, right = self.partition_array input_array
sortedLeft = self.sort left
sortedRight = self.sort right
y = self.merge_arrays( sortedLeft, sortedRight )
end
end



--
A command line 'runner'/'main' function
--
# Console runner / main
if !ARGV[0].nil?
input = []
ARGV.each{|x| input << x.to_i}
print MergeSort.sort(input).inspect
end


--
>ruby merge_sort.rb 90, 100, 40, 10, 45
[10, 40, 45, 90, 100]

And we're done! IMHO The Ruby Array class really shines here w.r.t. readability.

Feb 3, 2009

Dynamic evaluation of an expression

In this post, I'll tackle Step#2 of evaluating an expression like 2 * (3 + 5) and return 16.

  1. Convert expression from infix to postfix (see the last 2 posts)
  2. Evaluate postfix expression to determine result
As always, write a new test for ExpressionConverter

[Test]
public void TestEvaluate()
{
ExpressionConverter converter = new ExpressionConverter();
Assert.AreEqual(10, converter.EvaluateInfixExpression("((1+2) + 3) * 8/4"));
}


The implementation is very simple now that we have the ability to 'convert to postfix'.

public object EvaluateInfixExpression(string sInfixExpression)
{
List<object> postfixExpression = this.ConvertInfixToPostfix(sInfixExpression);
Stack<object> stack = new Stack<object();
foreach (object term in postfixExpression)
{
if (term is int)
{
stack.Push(term);
}
else
{
Operator operatorTerm = term as Operator;
int iValue2 = (int) stack.Pop();
int iValue1 = (int) stack.Pop();
stack.Push( operatorTerm.Evaluate(iValue1, iValue2) );
}
}
return stack.Peek();
}

Except that we don't have Operator#Evaluate()... Lets get over that hurdle. Mark this test with a temp Ignore tag. Write a new test in the TestOperator fixture

[Test]
public void TestPlusOperator_Evaluate()
{
Assert.AreEqual(30, Operator.Plus.Evaluate(10, 20));
}

If each operator could have a associated evaluate code block that evaluated 2 parameters and returned the result, we'd be done. A new member variable, a new parameterized ctor and a new code block later.. Here are the changes

public class Operator
{
public static readonly Operator Plus = new Operator("+", 1, (x, y) => x + y );
private Func<int,int,int> m_evaluateDelegate;

private Operator(string sSymbol, int iPrecedenceLevel, Func<int,int,int> evaluateDelegate):this(sSymbol, iPrecedenceLevel)
{
m_evaluateDelegate = evaluateDelegate;
}

public int Evaluate(int iValue1, int iValue2)
{
return m_evaluateDelegate(iValue1,iValue2);
}


Green. Ditto for the other operators -, *, /
All done. Finally a console app to do this


static void Main(string[] args)
{
ExpressionConverter expr = new ExpressionConverter();

Console.WriteLine("Postfix version is ");
foreach (object term in expr.ConvertInfixToPostfix(args[0]))
{ Console.Write(term + " "); }
Console.WriteLine();

Console.WriteLine("Expression evaluated to " + expr.EvaluateInfixExpression(args[0]) );
}


This is how it works.
Release>DynCalc.exe "((1+2) + 3) * 2 - 8/4"
Postfix version is
1 2 + 3 + 2 * 8 4 / -
Expression evaluated to 10


Now ExpressionConverter is no longer a converter.. we should probably rename it to Expression. Also it has no instance state. All methods can therefore be converted to static methods - no need to create an instance to invoke members. Done.
Other than non-happy paths like malformed expressions.. (can be done) it's all good. End of series. I had fun..

Test driving an infix to postfix expression converter (2 of 3)

So at the end of Post 1 of this series. we've a converter that doesn't yet understand operator precedence.
Let me refresh my memory before proceeding..
An operator token belongs to a class/family of 'Operators', has a string representation like *, +, /, - and finally has a relative precedence associated with it.
1 for +, -
2 for *, /
3 for (, )

Seems like we need an Operator class instead of just a string. I'll mark the failing test with [Ignore("Detour")] tag while we go make some operators.
Create a new test fixture, imagine a magical Operator class.. something like this


TestOperator.cs


[Test]
public void TestParse_PlusOperator()
{
Operator plusOp = Operator.Parse("+");
Assert.IsNotNull(plusOp);
Assert.AreEqual("+", plusOp.ToString());
Assert.AreEqual(1, plusOp.PrecedenceLevel);
}


I turn my ideas about Operators into a Type/Class.

public class Operator
{
private string m_OperatorSymbol;
private int m_iPrecedenceLevel;
private static readonly Operator plusOperator = new Operator("+", 1);

private Operator(string sSymbol, int iPrecedenceLevel)
{
m_OperatorSymbol = sSymbol;
m_iPrecedenceLevel = iPrecedenceLevel;
}

public int PrecedenceLevel
{
get { return m_iPrecedenceLevel; }
}
public override string ToString()
{ return m_OperatorSymbol; }


public static Operator Parse(string stringToParse)
{ return Operator.plusOperator; }
}



Green! I took the quick way out with the Parse implementation.. I'll let the coming tests force me to distill that bit of code. I add similar tests for Minus (and then Multiply and Divide). I end up with a need for a 'switch case' in Parse().
Also Parse should probably throw if its passed a token that it doesnt recognize. I'll note that down as a test too.


[Test]
[ExpectedException(ExceptionType=typeof(ArgumentException), ExpectedMessage="Unknown Operator! Cannot parse Gishu")]
public void TestParse_InvalidInput()
{
Operator noOp = Operator.Parse("Gishu");
Assert.Fail("Operator learnt some AI to parse Gishu. This should have blown up by now!");
}


[Furious typing] Alright now Operator.Parse() looks like this.. all its tests pass.

public static Operator Parse(string stringToParse)
{
switch (stringToParse)
{
case "+":
return Operator.plusOperator;
case "-":
return Operator.minusOperator;
case "*":
return Operator.multiplyOperator;
case "/":
return Operator.divisionOperator;

default:
throw new ArgumentException("Unknown Operator! Cannot parse " + stringToParse);
}
}


Now lets update our tokenizer to use this new type. This looks like it might break a few tests.. but with baby steps and a vigilant eye, it'll all be better by the time I'm done with it. I'll start by modifying one of my initial tokenizer tests...

[RowTest]
[Row("+")]
[Row(" + ")]
public void ReadPlusOperator(string sInput)
{
Tokenizer t = new Tokenizer(sInput);

//Assert.AreEqual("+", t.GetNextToken());
Assert.AreEqual(Operator.Plus, t.GetNextToken());
}


This would require us to make the static plusOperator member public. The other option is to check each member of the operator object… which would be duplication of the testfixture for Operator. So lets share our operators with the world.. make em public and rename.

public class Operator
{
public static readonly Operator Plus = new Operator("+", 1);
//...

And we're red...
TestDynCalc.TestTokenizer.ReadPlusOperator( + ):
Expected: <+>
But was: "+"


Instead of making a big bang change.. let’s make an ugly hack inside Tokenizer#GetNextToken() to make just this test pass..

Regex regexOperator = new Regex(@"^\s*([-+*/()])");
obResults = regexOperator.Match(m_sInputData);
if (obResults.Success)
{
Group obGroup = obResults.Groups[1];
RemoveReadCharacters(obGroup);

//return obGroup.Value;
if (obGroup.Value != "+")
return obGroup.Value;
else
return Operator.Plus;

}

I broke 2 tests - that tested the tokenizer with complete expressions containining +. Turns out its easy to fix them.

[Test]
public void ReadMultipleTokens()
{
Tokenizer t = new Tokenizer(" 2 + 3 - 34 * 2 / 1 ");
//object[] arrExpectedTokens = new object[] { 2, "+", 3, "-", 34, "*", 2, "/", 1 };
object[] arrExpectedTokens = new object[] { 2, Operator.Plus, 3, "-", 34, "*", 2, "/", 1 };
foreach (object token in arrExpectedTokens)
{
Assert.AreEqual(token, t.GetNextToken());
}
}

All green. Excuse me while I repeat the procedure for all the other operators. The hack in Tokenizer#GetNextToken() can now be replaced by ...

return Operator.Parse(obGroup.Value);


However we forgot the parantheses bros. while creating our operators. We need more operators! Mark Tokenizer with Ignore(“2 TDDed operators coming right up…”).. [more furious typing] we have new operators. Take off the ignore tag. Similar changes… and Tokenizer is now Operator-aware.
Time to take off the ignore tag on ExpressionConverter tests and end our little detour.
So we're back to red.. but this time, we've operators with precedence. [after some mental acrobatics with a pencil and an eraser on paper]
The key to getting this is to check if the operator on top of stack is not a '(' and has higher precedence than the incoming/current token. If yes, pop the stack, write popped operator to output and retry. If no, push the incoming token onto the stack.
So + should not be pushed if the top of stack[TOS] contains *; however * should be if TOS is +.
So we add some more code to the final else block in ExpressionConverter#ConvertToPostfix..


if (obStack.Count() > 0)
{
Operator termOnTopOfStack = obStack.Peek() as Operator;
if ((termOnTopOfStack != Operator.OpenParantheses)
&& (termOnTopOfStack.PrecedenceLevel > (term as Operator).PrecedenceLevel))
{
postfixExpr.Add(obStack.Pop());
continue;
}
}

obStack.Push(term);


All Green! Woohoo! However the method seems to have grown to an uncomfortable size for me... maybe I can refactor.

  • Rename method to ExpressionConverter#ConvertInfixToPostfix()

  • Stack always contains Operators.. So we can do away the casting...Stack stackOfOperators

  • Replace ‘term.ToString() == ")" ‘ with type-based checks

  • Each of ( and ) is a parenthesis together they make a set of parentheses. http://en.wikipedia.org/wiki/Parenthesis... well you live, you read wikis, you learn. Correct typos

  • Extract some methods to clarify intent.
    Had to promote the stack and postfixExpr variables to class members. (All the extracted methods required the stack and List variables to be passed in)



Looking good. I'll post the refactored version shortly. Let's try another expression/test.


[Row("0-12000-(16*4)-20", "0 12000 - 16 4 * - 20 -")]


Red again! [A few brain cycles later]
This breaks because the converter doesn’t recognize Left to Right precedence for operators at the same level. (In 0 - 5 + 3, - should be performed first to get the result of -2 even though precedence-wise + and - are equivalent.)
But this can be easily solved by replacing > with >= where we compare operator precedence. Here’s the final version of the class.


public class ExpressionConverter
{
List<object> m_postfixExpr = new List<object>();
Stack<Operator> m_stackOfOperators = new Stack<Operator>();
public List<Object> ConvertInfixToPostfix(string sInfixExpr)
{
m_postfixExpr.Clear(); m_stackOfOperators.Clear();

Tokenizer parser = new Tokenizer(sInfixExpr);
object term = parser.GetNextToken();
while (term != null)
{
if (term is int)
{
m_postfixExpr.Add(term);
}
else if (term == Operator.CloseParenthesis)
{
MoveOperatorsOnStackToPostfixExprTillMatchingOpenParenthesisIsFound();
}
else
{
Operator operatorTerm = term as Operator;
if (OperatorOnTopOfStackHasHigherOrSamePrecedenceAs(operatorTerm))
{
m_postfixExpr.Add(m_stackOfOperators.Pop());
continue;
}

m_stackOfOperators.Push(operatorTerm);
}

term = parser.GetNextToken();
}

PopAllRemainingOperatorsOnStackToPostfixExpr();
return m_postfixExpr;
}

private bool OperatorOnTopOfStackHasHigherOrSamePrecedenceAs(Operator term)
{
if (m_stackOfOperators.Count() == 0)
return false;

Operator termOnTopOfStack = m_stackOfOperators.Peek();
return ((termOnTopOfStack != Operator.OpenParenthesis)
&& (termOnTopOfStack.PrecedenceLevel >= term.PrecedenceLevel));
}

private void MoveOperatorsOnStackToPostfixExprTillMatchingOpenParenthesisIsFound()
{
Operator term = m_stackOfOperators.Pop();
while (term != Operator.OpenParenthesis)
{
m_postfixExpr.Add(term);
term = m_stackOfOperators.Pop();
}
}

private void PopAllRemainingOperatorsOnStackToPostfixExpr()
{
while (m_stackOfOperators.Count > 0)
m_postfixExpr.Add(m_stackOfOperators.Pop());
}
}


Code is good. End of post. In the next post, I'll write a console 'main' app to exercise the converter and evaluate the expression.

Feb 2, 2009

Test driving an infix to postfix expression converter (1 of 2)

I've been wanting to understand infix to postfix conversion - strangely intriguing to me. Here's what I'm about to ramble on for a while today...
Given an expression in infix form (operators appear inline in the expression) like
3 * (2 + 5)
Transform it to its postfix equivalent
3 2 5 + *


This form can be easily evaluated by a computer. You keep pushing terms onto a stack.. everytime you hit a operator, you pop the last 2 values and perform the operation, push the result back on the stack. In time, you will run out of terms and have the result at the top of the stack.
e.g. Stack builds from left to right (TOS is extreme right)
Read 3 Stack [3]
Read 2 Stack [3 2]
Read 5 Stack [3 2 5]
Read + Stack [3 7] #Pop 5,2 Perform 2+5, Store 7 back on stack
Read * Stack [21] # Pop 7,3 Perform 3*7, Store 21 back on stack.


Time and again I've procrastinated.. Then I hit Bart De Smet's blog post dated Oct 2006 - It's time! :). You may want to read up on the algorithm to parse an infix expression.

Just to make things interesting I'm going to do it TDD style.. red-green-refactor to glory. I'm trying to shorten the post to the essence.

I've TDDed myself a Tokenizer, a class that parses a string, uses a couple of regular expressions and returns the next number/operator with each call to GetNextToken().

Tokenizer.cs


public class Tokenizer
{
string m_sInputData;
public Tokenizer(string sInput)
{ m_sInputData = sInput; }

public object GetNextToken()
{
Regex regexNumber = new Regex(@"^\s*(\d+)");
Match obResults = regexNumber.Match(m_sInputData);
if (obResults.Success)
{
Group obGroup = obResults.Groups[1];
RemoveReadCharacters(obGroup);
return Int32.Parse(obGroup.Value);
}

Regex regexOperator = new Regex(@"^\s*([-+*/()])");
obResults = regexOperator.Match(m_sInputData);
if (obResults.Success)
{
Group obGroup = obResults.Groups[1];
RemoveReadCharacters(obGroup);
returnobGroup.Value;
}

return null;
}

private void RemoveReadCharacters(Group obMatchedGroup)
{
m_sInputData = m_sInputData.Remove(0, obMatchedGroup.Index + obMatchedGroup.Length);
}
}


The tokenizer has around 20 tests. I'll post one of the latter tests to illustrate usage

TestTokenizer.cs

[Test]
public void ReadMultipleTokensWithParantheses()
{
Tokenizer t = new Tokenizer(" ( 1 + 2 ) * (3 - (8/2))");
object[] arrExpectedTokens = new object[] { "(", 1, "+", 2, ")", "*", "(", 3, "-", "(", 8, "/", 2, ")", ")" };
foreach (object token in arrExpectedTokens)
{
Assert.AreEqual(token, t.GetNextToken());
}
}


All set to start with the meat of the problem: ExpressionConverter. So we'll come up with out first test employing some wishful thinking. The output is going to be reordered sequence of numbers and operators.. hence List<objects>

TestExpressionConverter.cs

[Test]
public void TestConvert_Simple()
{
ExpressionConverter converter = new ExpressionConverter();
List<object> postfixExpr = converter.ConvertToPostfix(" 2 + 3 ");
StringBuilder sb = new StringBuilder();
foreach(object term in postfixExpr)
{
sb.AppendFormat("{0} ", term.ToString());
}
Assert.AreEqual("2 3 +", sb.ToString());
}


To make this compile, I need to define the ExpressionConverter class and the ConvertToPostfix method within it. Something like this

public List<Object> ConvertToPostfix(string sInfixExpr)
{
List<object> postfixExpr = new List<object>();
Stack<object> obStack = new Stack<object>();

Tokenizer parser = new Tokenizer(sInfixExpr);
object term = parser.GetNextToken();
while (term != null)
{
if (term is int)
{ postfixExpr.Add(term); }
else
{ obStack.Push(term); }

term = parser.GetNextToken();
}

while (obStack.Count > 0)
postfixExpr.Add(obStack.Pop());

return postfixExpr;
}


The simplest thing that works. Collect all numeric terms in the result List<>, push operators onto the stack. At the end, pop contents of the stack into the List. Green!
Let's turn on the heat, see how it deals with grouping. Since the test is basically the same, albeit with diff input and hence output, I found an unorthodox way to use the RowTest NUnit extension I blogged about earlier. I make a couple of changes to the test like this... The last parameter in the list is the ExpectedValue for the test. So now I can run the same test with different input and expected values by adding Row Attributes. Might be a bit hard to read for the new guy.. but not too bad.



[RowTest]
[Row(" 2 + 3 ", "2 3 +")]
[Row(" 2 + (30 + 50 ) ", "2 30 50 + +")]
public void TestConvertInfixToPostfix(string sInfixExpr, string sExpectedPostfixExpr)
{
ExpressionConverter converter = new ExpressionConverter();
List<object> postfixExpr = converter.ConvertInfixToPostfix(sInfixExpr);

StringBuilder sb = new StringBuilder();
foreach(object term in postfixExpr)
{
sb.AppendFormat("{0} ", term.ToString());
}
Assert.AreEqual(sExpectedPostfixExpr, sb.ToString().Trim());
}


This fails as expected. Our code doesn't do parentheses yet. When we read a ), we need to pop out all operators till the matching (. Both ( ) are discarded and never part of the postfix expression. We can make this change in ConvertToPostfix()

if (term is int)
{
postfixExpr.Add(term);
}

else if (term.ToString() == ")")
{
object poppedTerm = obStack.Pop();
while (poppedTerm.ToString() != "(")
{
postfixExpr.Add(poppedTerm);
poppedTerm = obStack.Pop();
}
}

else
{
obStack.Push(term);
}


and back to Green. Lets take it up a notch. Add a new Row attribute / new test with the following expression

[Row(" ( (10+20) + 30 ) * 20-8/4 ", "10 20 + 30 + 20 * 8 4 / -")]


The test fails. NUnit reports:
TestDynCalc.TestExpressionConverter.TestConvertToPostfix( ( (10+20) + 30 ) * 20-8/4 , 10 20 + 30 + 20 * 8 4 / -):
String lengths are both 25. Strings differ at index 16.
Expected: "10 20 + 30 + 20 * 8 4 / -"
But was: "10 20 + 30 + 20 8 4 / - *"
---------------------------^


Bitten by operator precedence.. * has precedence over -. Is this the end for our converter? Not quite.. Onto Part 2


Feb 1, 2009

Agile gone wild (due apologies to Skid Row)

Been watching from the sidelines as to how Agile evolves into mainstream software development...
Its funny how everything is changing (w.r.t the latest fad) and simultaneously how the factors behind the success of the initial methodology starting teams stay the same.. good people with skill & discipline, who communicate and relish-and-respond to feedback combined with an environment that supports their cause.

Collecting a set of links that chronicles the current state of "Agile" - whatever little of the original intent is left of it post the rampant SemanticDiffusion

  1. *** James Shore - The decline and fall of Agile
  2. ** Martin Fowler - Flaccid Scrums
  3. *** Ron Jeffries - Context my Foot!
  4. *** Ron Jeffries - Discovering Essential Technical Practices