My conversations with gullible machines...

How to compile ruby from source on Windows?

I wanted to play with the latest version of ruby. I got the win32 binary zip of 1.9.1 p0 - however irb used to popup an error message on startup - used it for sometime but then I decided to fix it. Checked back on ruby's download page and found an updated version p129. Downloaded it but it was a source zip.

I wasn't sure that compiling it from source would be trivial... however it reinforced the simplicity of the ruby way. Summarizing the steps I found on this post at ruby-forum.com

Get the source zip from the download page and unzip it locally to say c:\ruby-src
Open the Visual Studio Command prompt (I have VS2008 on WinXP SP2)
Make a folder for your build say c:\ruby-build. Move into it / change dir
c:\ruby-build> c:\ruby-src\ruby-1.9.1-p129\win32\configure.bat
c:\ruby-build> nmake
c:\ruby-build> nmake test
c:\ruby-build> nmake DESTDIR=c:/ruby-build install

That's it.. you'll find the binaries within a usr folder within the ruby-build folder.

How to create a global (system) windows hook ?

What are Windows Hooks? You probably don't want to know that :) Good starting points are
The MSDN door to hooks
Win32 Hooks

A global / system hook would be “I want to be called when anyone does X in any process” ; the other kind is when “I want to be called when anyone does X in ThreadID#Y”. Creating a global windows hook cannot be done in managed code except (some keyboard/mouse hooks) via P/Invoke (so save yourself some time although local hooks are possible).
Windows would callback on the Hook function ; also called the filter function. The signature -


LRESULT CALLBACK FilterFunctionForHook(int code, WPARAM wParam, LPARAM lParam)

For global hooks, the filter function should be contained in an unmanaged DLL – I created one called WinHookFacade.dll. This unmanaged DLL would be injected into each process. Also the copy of the DLL loaded into other process cannot be explicitly unloaded – will unload when the process goes down. Also any bad code in the hook function can result in bad scary things and may require the user to go through a logoff-login or reboot cycle. Avoid if possible because global hooks will extract their pound of performance. But when has that stopped us programmers ? ;) Hence exercise caution with global hooks.

To create a global hook, you therefore need to create an exe project that hooks/unhooks the filter function and a dll project that contains the filter function.

The Hook


void Hook()
{
HINSTANCE hCurrentDll = GetModuleHandle(_T("WinHookFacade.dll"));
g_HookHandle = SetWindowsHookEx(WH_CBT,
FilterFunctionForHook,
hCurrentDll,
0);

if (g_HookHandle == NULL)
throw new std::exception("Unable to hook");
}

The key here would be the SetWindowsHookEx Win32 API function. The first parameter is what type of hook you’d like to register for – See this page for available types. In this example, WH_CBT means events related to windows (creation,activation,destructions etc). The second parameter is the name of the Hook/Filter function that shall be called back by the OS – detailed below. The third parameter is a handle to the DLL containing the hook function for global hooks (if null, it indicates a thread-specific hook). The fourth parameter is 0 for global hooks (for thread/local hooks, it is the Thread ID you want to hook into).

The Callback


LRESULT CALLBACK FilterFunctionForHook(int code, WPARAM wParam, LPARAM lParam)
{
if (code >= 0)
{
   //your code e.g.
   switch(code)
   {
      case HCBT_ACTIVATE:
     // log the window title                  
     ...
}
return CallNextHookEx(g_HookHandle, code, wParam, lParam);
}

This function would be called in the process in which the event occurs (e.g. where a window is created). Process Isolation : a process cannot run custom code into another process; hence DLL injection. Also the CallNextHookEx Win32 API function must be called to pass control onto the next filter function in queue passing it the original arguments received to be a good citizen. The first parameter is the Hook Handle obtained when you registered the Hook function.

IMPORTANT: However this value must be the same across all instances of the shared Hook DLL & must be available within the filter function. For this reason, you need to create the “hook handle” in a shared data segment Whaa.. how do I do that? Declare your HHOOK variable like this


#pragma data_seg (".MY_HOOK_DATA")
HHOOK g_HookHandle = 0;
#pragma data_seg()

This function is supposed to call the next function in queue without doing anything if the code argument is less than zero. If not, the function may execute custom code to do specific tasks. The wParam and lParam arguments are pointers to handles/structures containing additional info. The exact type of structure depends on the value of the code parameter – check msdn for the page for the specific alias of the filterfunction e.g. in my case it would be CBTPROC; wParam is a handle to the window and lParam is a pointer to a struct

Cleanup


void Unhook()
{
 if (!UnhookWindowsHookEx(g_HookHandle))
     throw new std::exception("Unhook failed!");

 g_HookHandle = NULL;
}

Here the primary player is the UnhookWindowsHookEx Win32 API Function. In case of an error, you’d need to use the fugly GetLastError() function to know what really went wrong.

Test Drive

Now to take this for a spin, create an exe project referencing the DLL containing the hook function. The main function will call on the exported functions as shown below..


int _tmain(int argc, _TCHAR* argv[])
{
  char buffer[256];
  try
  {
      printf("Hooking...");
      Hook();
      printf("Done.\n");
    
    
      printf("Press Enter to exit\n");      // hook is running
      gets(buffer);

      printf("Unhooking...");
      Unhook();
      printf("Done. Woohoo!!\n");
  …

That’s all there is to it.
CAUTION: Also if you get something wrong, the effects would be felt immediately – in my case, the UI would freeze as soon as my botched hook function was registered; my taskbar would be nuked and random error dialogs from running apps. But nothing a reboot can’t solve! C++ string compiler errors, WinAPI gotchas & hooks -- It was like walking on a tightrope in dark windy conditions. <runs back to the managed side>

Ruby is slow... until you learn to write good code.

I wrote up a small program to find prime-numbers between x and y ; implemented the Sieve of Eratosthenes.
The SOE is basically where you take all numbers (say 1-100) into an array. Cross out 1. Loop from 2-Sqrt(N) i.e. 2-10. Now cross out every multiple of 2 in the array. Next cross out every multiple of 3.. and so on. Finally all the numbers that aren't crossed out are prime.

However I did a very simplistic implementation (test-driven) to boot. All the tests did pass.. however running it to get all primes between 10K and 100K took 34 secs to execute. And I was on the new n improved ruby 1.9 with a better VM. I switched to the stable ruby 1.8.6 and it took ~60 secs. So now I turned back to my primary Weapon C#. The same steps implemented in C# took 722 msecs. So now there were 2 possibilities
1. Ruby is sloooow (the usual gripe)
2. Something in my algorithm is messed up.

So I posted my source onto StackOverflow - (my current fave time sink :) to get some eyeballs to look at the problem.
So here's my naive implementation

class PrimeGenerator
  def self.get_primes_between( x, y)
    sieve_array = Array.new(y) {|index|
      (index == 0 ? 0 : index+1)
    }

    position_when_we_can_stop_checking = Math.sqrt(y).to_i
    (2..position_when_we_can_stop_checking).each{|factor|
      sieve_array[(factor).. (y-1)].each{|number|
        sieve_array[number-1] = 0 if isMultipleOf(number, factor)
      }
    }
    
    sieve_array.select{|element| 
      ( (element != 0) && ( (x..y).include? element) )
    }
  end
  def self.isMultipleOf(x, y)
    return (x % y) == 0
  end
end

# Benchmarking code


require 'benchmark'
Benchmark.bm(30) do |r|
  r.report("Gishu") { a = PrimeGenerator.get_primes_between( 1_000, 10_000) }
end

A few people got back with what might be the problem.
One of the first and most voted issue was the slicing the array to get a new subset array. That's got to be expensive inside a loop for an array of this size. Obviously I was using for loops in C# (since C# doesn't have Ruby's each)

for index in factor..y-1 do
  sieve_array[index] = 0 if isMultipleOf(index+1, factor)
end

That shaved off 8 secs - 24%. But still 26 secs. Another optimization was to reduce iterations - instead of checking each number with IsMultiple(number, _loopVar), set the 'step' for the loop to equal _loopVar (so from 2 you go 4,6,8,10...)

And finally Mike Woodhouse dispatched a homer. He posted a tiny snippet that did the same thing in under 1 sec. No that's not a typo.

def sieve_to(n)
  s = (0..n).to_a
  s[0]=s[1]=nil
  s.each do |p|
    next unless p
    break if p * p > n
    (p*p).step(n, p) { |m| s[m] = nil }
  end
  s.compact
end

# Usage:
# puts sieve_to(11).select{|x| x>=3}.inspect


        user     system      total        real
Gishu  27.219000   0.000000  27.219000 ( 27.375000)
Mike    0.141000   0.000000   0.141000 (  0.140625)

I then had to know how he did it..

Smart#1: He skips if array[_loopvar] is already 0. whereas if array[6] is crossed out due to 3, I did a scanning run to set all multiples of 6 to 0 which were already 0.(now down to 6 secs)

Smart#2: Use the Numeric#step trick to jump to the next multiple vs traversing entire list with an IsMultiple check (now down to 0.25 secs)

Smart#3: He begins the inner loop with Square of p. e.g if _loopvar is 7, I used to check from 8->end. Mike checks from 49->end. This one was a little tricky to figure out.. the reason is 7x2, 7x3, 7x4, 7x5, 7x6 have already been crossed out when _loopVar was 2,3,4,5,6 respectively. (now down to 0.21 secs)

Smart#4: Smart use of Ruby's nil for crossed out vs my choice of 0. That followed by Array#compact reduces the number of elements for the final select operation to get primes between x and y (now down to 0.18 secs.)

So after all that it's Ruby 0.15 secs and C# (with same optimizations) 0.028 secs, much more easy to take in.

An interesting thought to end this post is that even with the naive first implementation C# performed reasonably well... under a sec. I'd have moved on to the next task. Ruby on the other hand slowed to a crawl forcing me to take another look for optimizations. In a indirect sort of way, C#'s superior performance could be hiding bad/substandard solutions and letting programmers get away with it...

MergeSort in Ruby

Finally got the ball rolling on the 'Introduction to Algorithms' book.. What I figured was that I implement the sorting algorithms in ruby - help me get better at ruby AND algorithms.
I've got to confess.. I'm test-infected so I'm gonna TDD my way out. This series of posts are not going to be in a walkthrough style of narration.. I'm just going to post complete samples. The order of methods indicate how the code was built up.. (the first method is the 'oldest'.

Here's merge_sort_test.rb [The Test Fixture]
--

require 'test/unit'
require 'merge_sort'
class TestMergeSort < Test::Unit::TestCase
  def test_partition_array_with_even_number_of_elements
    left, right = MergeSort.partition_array [11, 5, 9, 15]
    assert_equal [11, 5], left
    assert_equal [9, 15], right
  end
  def test_partition_array_with_odd_number_of_elements
    left, right = MergeSort.partition_array [11, 5, 45, 9, 15]
    assert_equal [11, 5], left
    assert_equal [45, 9, 15], right
  end
  
  def test_merge_arrays
    assert_equal( [10,20,30,40], 
      MergeSort.merge_arrays( [10,30], [20,40] ) )
    
    assert_equal( [1, 3, 5, 9], 
      MergeSort.merge_arrays( [1, 5, 9], [3] ) )
    assert_equal( [1, 3, 5, 9], 
      MergeSort.merge_arrays( [3], [1,5,9] ) )
  end
    
  def test_sort
    assert_equal [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],  
      MergeSort.sort([ 8, 3, 4, 0, 9, 1, 2, 7, 6, 5])
  end
end

--
merge_sort.rb [The Implementation]
--

class MergeSort 
  def self.partition_array( input_array)
    mid_index = (input_array.length / 2) - 1
    [input_array[0..mid_index], input_array[(mid_index+1)..(input_array.length-1)]]
  end
  
  def self.merge_arrays(array1, array2)
    merged_array = []

    until (array1.empty? || array2.empty?) do
      if (array1[0] < array2[0])
        merged_array << array1.shift
      else
        merged_array << array2.shift
      end
    end
    
    merged_array + array1 + array2
  end
  
  def self.sort( input_array )
    return input_array if (input_array.length < 2)
    
    left, right = self.partition_array input_array
    sortedLeft = self.sort left
    sortedRight = self.sort right
    y = self.merge_arrays( sortedLeft, sortedRight )
  end
end

--
A command line 'runner'/'main' function
--

# Console runner / main
if !ARGV[0].nil?
  input = []
  ARGV.each{|x| input << x.to_i}
  print MergeSort.sort(input).inspect
end

--
>ruby merge_sort.rb 90, 100, 40, 10, 45
[10, 40, 45, 90, 100]

And we're done! IMHO The Ruby Array class really shines here w.r.t. readability.

Dynamic evaluation of an expression

In this post, I'll tackle Step#2 of evaluating an expression like 2 * (3 + 5) and return 16.

Convert expression from infix to postfix (see the last 2 posts)
Evaluate postfix expression to determine result

As always, write a new test for ExpressionConverter


[Test]
public void TestEvaluate()
{
   ExpressionConverter converter = new ExpressionConverter();
   Assert.AreEqual(10, converter.EvaluateInfixExpression("((1+2) + 3) * 8/4"));
}

The implementation is very simple now that we have the ability to 'convert to postfix'.


public object EvaluateInfixExpression(string sInfixExpression)
{
   List<object> postfixExpression = this.ConvertInfixToPostfix(sInfixExpression);
   Stack<object> stack = new Stack<object();
   foreach (object term in postfixExpression)
   {
       if (term is int)
       {
           stack.Push(term);
       }
       else
       {
          Operator operatorTerm = term as Operator;
          int iValue2 = (int) stack.Pop();
          int iValue1 = (int) stack.Pop();
          stack.Push( operatorTerm.Evaluate(iValue1, iValue2) );
       }
   }
   return stack.Peek();
}

Except that we don't have Operator#Evaluate()... Lets get over that hurdle. Mark this test with a temp Ignore tag. Write a new test in the TestOperator fixture


[Test]
public void TestPlusOperator_Evaluate()
{
   Assert.AreEqual(30, Operator.Plus.Evaluate(10, 20));
}

If each operator could have a associated evaluate code block that evaluated 2 parameters and returned the result, we'd be done. A new member variable, a new parameterized ctor and a new code block later.. Here are the changes


public class Operator
{
   public static readonly Operator Plus = new Operator("+", 1, (x, y) => x + y );
   private Func<int,int,int> m_evaluateDelegate;

   private Operator(string sSymbol, int iPrecedenceLevel, Func<int,int,int> evaluateDelegate):this(sSymbol, iPrecedenceLevel)
   {
       m_evaluateDelegate = evaluateDelegate;
   }

   public int Evaluate(int iValue1, int iValue2)
   {
      return m_evaluateDelegate(iValue1,iValue2);
   }

Green. Ditto for the other operators -, *, /
All done. Finally a console app to do this


static void Main(string[] args)
{
   ExpressionConverter expr = new ExpressionConverter();

   Console.WriteLine("Postfix version is ");
   foreach (object term in expr.ConvertInfixToPostfix(args[0]))
   {   Console.Write(term + " ");   }
   Console.WriteLine();

   Console.WriteLine("Expression evaluated to " + expr.EvaluateInfixExpression(args[0]) );
}

This is how it works.
Release>DynCalc.exe "((1+2) + 3) * 2 - 8/4"
Postfix version is
1 2 + 3 + 2 * 8 4 / -
Expression evaluated to 10

Now ExpressionConverter is no longer a converter.. we should probably rename it to Expression. Also it has no instance state. All methods can therefore be converted to static methods - no need to create an instance to invoke members. Done.
Other than non-happy paths like malformed expressions.. (can be done) it's all good. End of series. I had fun..

Test driving an infix to postfix expression converter (2 of 3)

So at the end of Post 1 of this series. we've a converter that doesn't yet understand operator precedence.
Let me refresh my memory before proceeding..
An operator token belongs to a class/family of 'Operators', has a string representation like *, +, /, - and finally has a relative precedence associated with it.
1 for +, -
2 for *, /
3 for (, )

Seems like we need an Operator class instead of just a string. I'll mark the failing test with [Ignore("Detour")] tag while we go make some operators.
Create a new test fixture, imagine a magical Operator class.. something like this

TestOperator.cs


[Test]
public void TestParse_PlusOperator()
{
Operator plusOp = Operator.Parse("+");
Assert.IsNotNull(plusOp);
Assert.AreEqual("+", plusOp.ToString());
Assert.AreEqual(1, plusOp.PrecedenceLevel);
}

I turn my ideas about Operators into a Type/Class.


public class Operator
{
private string m_OperatorSymbol;
private int m_iPrecedenceLevel;
private static readonly Operator plusOperator = new Operator("+", 1);

private Operator(string sSymbol, int iPrecedenceLevel)
{
   m_OperatorSymbol = sSymbol;
   m_iPrecedenceLevel = iPrecedenceLevel;
}

public int PrecedenceLevel
{
   get { return m_iPrecedenceLevel; }
}
public override string ToString()
{   return m_OperatorSymbol;    }


public static Operator Parse(string stringToParse)
{   return Operator.plusOperator;    }
}

Green! I took the quick way out with the Parse implementation.. I'll let the coming tests force me to distill that bit of code. I add similar tests for Minus (and then Multiply and Divide). I end up with a need for a 'switch case' in Parse().
Also Parse should probably throw if its passed a token that it doesnt recognize. I'll note that down as a test too.


[Test]
[ExpectedException(ExceptionType=typeof(ArgumentException), ExpectedMessage="Unknown Operator! Cannot parse Gishu")]
public void TestParse_InvalidInput()
{
Operator noOp = Operator.Parse("Gishu");
Assert.Fail("Operator learnt some AI to parse Gishu. This should have blown up by now!");
}

[Furious typing] Alright now Operator.Parse() looks like this.. all its tests pass.


public static Operator Parse(string stringToParse)
{
switch (stringToParse)
{
   case "+":
       return Operator.plusOperator;
   case "-":
       return Operator.minusOperator;
   case "*":
       return Operator.multiplyOperator;
   case "/":
       return Operator.divisionOperator;

   default:
       throw new ArgumentException("Unknown Operator! Cannot parse " + stringToParse);
}
}

Now lets update our tokenizer to use this new type. This looks like it might break a few tests.. but with baby steps and a vigilant eye, it'll all be better by the time I'm done with it. I'll start by modifying one of my initial tokenizer tests...


[RowTest]
[Row("+")]
[Row(" + ")]
public void ReadPlusOperator(string sInput)
{
Tokenizer t = new Tokenizer(sInput);
  
//Assert.AreEqual("+", t.GetNextToken());
Assert.AreEqual(Operator.Plus, t.GetNextToken());
}

This would require us to make the static plusOperator member public. The other option is to check each member of the operator object… which would be duplication of the testfixture for Operator. So lets share our operators with the world.. make em public and rename.


public class Operator
{
public static readonly Operator Plus = new Operator("+", 1);
//...

And we're red...
TestDynCalc.TestTokenizer.ReadPlusOperator( + ):
Expected: <+>
But was: "+"

Instead of making a big bang change.. let’s make an ugly hack inside Tokenizer#GetNextToken() to make just this test pass..


Regex regexOperator = new Regex(@"^\s*([-+*/()])");
obResults = regexOperator.Match(m_sInputData);
if (obResults.Success)
{
Group obGroup = obResults.Groups[1];
RemoveReadCharacters(obGroup);

//return obGroup.Value;
if (obGroup.Value != "+")
    return obGroup.Value;
else
    return Operator.Plus;

}

I broke 2 tests - that tested the tokenizer with complete expressions containining +. Turns out its easy to fix them.


[Test]
public void ReadMultipleTokens()
{
Tokenizer t = new Tokenizer(" 2 + 3 - 34 * 2 / 1 ");
//object[] arrExpectedTokens = new object[] { 2, "+", 3, "-", 34, "*", 2, "/", 1 };
object[] arrExpectedTokens = new object[] { 2, Operator.Plus, 3, "-", 34, "*", 2, "/", 1 };
foreach (object token in arrExpectedTokens)
{
    Assert.AreEqual(token, t.GetNextToken());
}
}

All green. Excuse me while I repeat the procedure for all the other operators. The hack in Tokenizer#GetNextToken() can now be replaced by ...


return Operator.Parse(obGroup.Value);

However we forgot the parantheses bros. while creating our operators. We need more operators! Mark Tokenizer with Ignore(“2 TDDed operators coming right up…”).. [more furious typing] we have new operators. Take off the ignore tag. Similar changes… and Tokenizer is now Operator-aware.
Time to take off the ignore tag on ExpressionConverter tests and end our little detour.
So we're back to red.. but this time, we've operators with precedence. [after some mental acrobatics with a pencil and an eraser on paper]
The key to getting this is to check if the operator on top of stack is not a '(' and has higher precedence than the incoming/current token. If yes, pop the stack, write popped operator to output and retry. If no, push the incoming token onto the stack.
So + should not be pushed if the top of stack[TOS] contains *; however * should be if TOS is +.
So we add some more code to the final else block in ExpressionConverter#ConvertToPostfix..


if (obStack.Count() > 0)
{
Operator termOnTopOfStack = obStack.Peek() as Operator;
if ((termOnTopOfStack != Operator.OpenParantheses)
    && (termOnTopOfStack.PrecedenceLevel > (term as Operator).PrecedenceLevel))
{
    postfixExpr.Add(obStack.Pop());
    continue;
}
}

obStack.Push(term);

All Green! Woohoo! However the method seems to have grown to an uncomfortable size for me... maybe I can refactor.

Rename method to ExpressionConverter#ConvertInfixToPostfix()

Stack always contains Operators.. So we can do away the casting...Stack stackOfOperators

Replace ‘term.ToString() == ")" ‘ with type-based checks

Each of ( and ) is a parenthesis together they make a set of parentheses. http://en.wikipedia.org/wiki/Parenthesis... well you live, you read wikis, you learn. Correct typos

Extract some methods to clarify intent.
Had to promote the stack and postfixExpr variables to class members. (All the extracted methods required the stack and List variables to be passed in)

Looking good. I'll post the refactored version shortly. Let's try another expression/test.


[Row("0-12000-(16*4)-20", "0 12000 - 16 4 * - 20 -")]

Red again! [A few brain cycles later]
This breaks because the converter doesn’t recognize Left to Right precedence for operators at the same level. (In 0 - 5 + 3, - should be performed first to get the result of -2 even though precedence-wise + and - are equivalent.)
But this can be easily solved by replacing > with >= where we compare operator precedence. Here’s the final version of the class.


public class ExpressionConverter
{
 List<object> m_postfixExpr = new List<object>();
 Stack<Operator> m_stackOfOperators = new Stack<Operator>();
 public List<Object> ConvertInfixToPostfix(string sInfixExpr)
 {
     m_postfixExpr.Clear(); m_stackOfOperators.Clear();

     Tokenizer parser = new Tokenizer(sInfixExpr);
     object term = parser.GetNextToken();
     while (term != null)
     {
         if (term is int)
         {
             m_postfixExpr.Add(term);
         }
         else if (term == Operator.CloseParenthesis)
         {
             MoveOperatorsOnStackToPostfixExprTillMatchingOpenParenthesisIsFound();
         }
         else
         {
             Operator operatorTerm = term as Operator;
             if (OperatorOnTopOfStackHasHigherOrSamePrecedenceAs(operatorTerm))
             {
                 m_postfixExpr.Add(m_stackOfOperators.Pop());
                 continue;
             }

             m_stackOfOperators.Push(operatorTerm);
         }

         term = parser.GetNextToken();
     }

     PopAllRemainingOperatorsOnStackToPostfixExpr();
     return m_postfixExpr;
 }

 private bool OperatorOnTopOfStackHasHigherOrSamePrecedenceAs(Operator term)
 {
     if (m_stackOfOperators.Count() == 0)
         return false;

     Operator termOnTopOfStack = m_stackOfOperators.Peek();
     return ((termOnTopOfStack != Operator.OpenParenthesis)
             && (termOnTopOfStack.PrecedenceLevel >= term.PrecedenceLevel));
 }

 private void MoveOperatorsOnStackToPostfixExprTillMatchingOpenParenthesisIsFound()
 {
     Operator term = m_stackOfOperators.Pop();
     while (term != Operator.OpenParenthesis)
     {
         m_postfixExpr.Add(term);
         term = m_stackOfOperators.Pop();
     }
 }

 private void PopAllRemainingOperatorsOnStackToPostfixExpr()
 {
     while (m_stackOfOperators.Count > 0)
         m_postfixExpr.Add(m_stackOfOperators.Pop());
 }
}

Code is good. End of post. In the next post, I'll write a console 'main' app to exercise the converter and evaluate the expression.

Test driving an infix to postfix expression converter (1 of 2)

I've been wanting to understand infix to postfix conversion - strangely intriguing to me. Here's what I'm about to ramble on for a while today...
Given an expression in infix form (operators appear inline in the expression) like
3 * (2 + 5)
Transform it to its postfix equivalent
3 2 5 + *

This form can be easily evaluated by a computer. You keep pushing terms onto a stack.. everytime you hit a operator, you pop the last 2 values and perform the operation, push the result back on the stack. In time, you will run out of terms and have the result at the top of the stack.
e.g. Stack builds from left to right (TOS is extreme right)
Read 3 Stack [3]
Read 2 Stack [3 2]
Read 5 Stack [3 2 5]
Read + Stack [3 7] #Pop 5,2 Perform 2+5, Store 7 back on stack
Read * Stack [21] # Pop 7,3 Perform 3*7, Store 21 back on stack.

Time and again I've procrastinated.. Then I hit Bart De Smet's blog post dated Oct 2006 - It's time! :). You may want to read up on the algorithm to parse an infix expression.

Just to make things interesting I'm going to do it TDD style.. red-green-refactor to glory. I'm trying to shorten the post to the essence.

I've TDDed myself a Tokenizer, a class that parses a string, uses a couple of regular expressions and returns the next number/operator with each call to GetNextToken().

Tokenizer.cs


public class Tokenizer
{
string m_sInputData;
public Tokenizer(string sInput)
{    m_sInputData = sInput;    }

public object GetNextToken()
{
Regex regexNumber = new Regex(@"^\s*(\d+)");
Match obResults = regexNumber.Match(m_sInputData);
if (obResults.Success)
{
    Group obGroup = obResults.Groups[1];
    RemoveReadCharacters(obGroup);
    return Int32.Parse(obGroup.Value);
}

Regex regexOperator = new Regex(@"^\s*([-+*/()])");
obResults = regexOperator.Match(m_sInputData);
if (obResults.Success)
{
    Group obGroup = obResults.Groups[1];
    RemoveReadCharacters(obGroup);
    returnobGroup.Value;
}

return null;
}

private void RemoveReadCharacters(Group obMatchedGroup)
{
m_sInputData = m_sInputData.Remove(0, obMatchedGroup.Index + obMatchedGroup.Length);
}
}

The tokenizer has around 20 tests. I'll post one of the latter tests to illustrate usage

TestTokenizer.cs


[Test]
public void ReadMultipleTokensWithParantheses()
{
Tokenizer t = new Tokenizer(" ( 1 + 2 ) * (3 - (8/2))");
object[] arrExpectedTokens = new object[] { "(", 1, "+", 2, ")", "*", "(", 3, "-", "(", 8, "/", 2, ")", ")" };
foreach (object token in arrExpectedTokens)
{
Assert.AreEqual(token, t.GetNextToken());
}
}

All set to start with the meat of the problem: ExpressionConverter. So we'll come up with out first test employing some wishful thinking. The output is going to be reordered sequence of numbers and operators.. hence List<objects>

TestExpressionConverter.cs


[Test]
public void TestConvert_Simple()
{
ExpressionConverter converter = new ExpressionConverter();
List<object> postfixExpr = converter.ConvertToPostfix(" 2 + 3 ");
StringBuilder sb = new StringBuilder();
foreach(object term in postfixExpr)
{
 sb.AppendFormat("{0} ", term.ToString());
}
Assert.AreEqual("2 3 +", sb.ToString());
}

To make this compile, I need to define the ExpressionConverter class and the ConvertToPostfix method within it. Something like this


public List<Object> ConvertToPostfix(string sInfixExpr)
{
List<object> postfixExpr = new List<object>();
Stack<object> obStack = new Stack<object>();

Tokenizer parser = new Tokenizer(sInfixExpr);
object term = parser.GetNextToken();
while (term != null)
{
 if (term is int)
 {   postfixExpr.Add(term);   }
 else
 {   obStack.Push(term);   }

 term = parser.GetNextToken();
}

while (obStack.Count > 0)
 postfixExpr.Add(obStack.Pop());

return postfixExpr;
}

The simplest thing that works. Collect all numeric terms in the result List<>, push operators onto the stack. At the end, pop contents of the stack into the List. Green!
Let's turn on the heat, see how it deals with grouping. Since the test is basically the same, albeit with diff input and hence output, I found an unorthodox way to use the RowTest NUnit extension I blogged about earlier. I make a couple of changes to the test like this... The last parameter in the list is the ExpectedValue for the test. So now I can run the same test with different input and expected values by adding Row Attributes. Might be a bit hard to read for the new guy.. but not too bad.


[RowTest]
[Row(" 2 + 3 ", "2 3 +")]
[Row(" 2 + (30 + 50 ) ", "2 30 50 + +")]
public void TestConvertInfixToPostfix(string sInfixExpr, string sExpectedPostfixExpr)
{
ExpressionConverter converter = new ExpressionConverter();
List<object> postfixExpr = converter.ConvertInfixToPostfix(sInfixExpr);

StringBuilder sb = new StringBuilder();
foreach(object term in postfixExpr)
{
  sb.AppendFormat("{0} ", term.ToString());
}
Assert.AreEqual(sExpectedPostfixExpr, sb.ToString().Trim());
}

This fails as expected. Our code doesn't do parentheses yet. When we read a ), we need to pop out all operators till the matching (. Both ( ) are discarded and never part of the postfix expression. We can make this change in ConvertToPostfix()


if (term is int)
{
postfixExpr.Add(term);
}

else if (term.ToString() == ")")
{
object poppedTerm = obStack.Pop();
while (poppedTerm.ToString() != "(")
{
  postfixExpr.Add(poppedTerm);
  poppedTerm = obStack.Pop();
}
}
else
{
obStack.Push(term);
}

and back to Green. Lets take it up a notch. Add a new Row attribute / new test with the following expression


[Row("  ( (10+20) + 30 ) * 20-8/4 ", "10 20 + 30 + 20 * 8 4 / -")]

The test fails. NUnit reports:
TestDynCalc.TestExpressionConverter.TestConvertToPostfix( ( (10+20) + 30 ) * 20-8/4 , 10 20 + 30 + 20 * 8 4 / -):
String lengths are both 25. Strings differ at index 16.
Expected: "10 20 + 30 + 20 * 8 4 / -"
But was: "10 20 + 30 + 20 8 4 / - *"
---------------------------^

Bitten by operator precedence.. * has precedence over -. Is this the end for our converter? Not quite.. Onto Part 2

Agile gone wild (due apologies to Skid Row)

Been watching from the sidelines as to how Agile evolves into mainstream software development...
Its funny how everything is changing (w.r.t the latest fad) and simultaneously how the factors behind the success of the initial methodology starting teams stay the same.. good people with skill & discipline, who communicate and relish-and-respond to feedback combined with an environment that supports their cause.

Collecting a set of links that chronicles the current state of "Agile" - whatever little of the original intent is left of it post the rampant SemanticDiffusion

*** James Shore - The decline and fall of Agile
** Martin Fowler - Flaccid Scrums
*** Ron Jeffries - Context my Foot!
*** Ron Jeffries - Discovering Essential Technical Practices

NUnit RowTest Extension : Running the same test with different inputs

Update 2010-03-10: The following extension has now been superseded by the TestCase attribute which is now a part of the NUnit core (v2.5 and later). (nunit.framework.dll)

For an equivalent code sample, see the end of the post.

End of Update

This extension allows you an elegant way of handing the scenario where you need to run with different sets of input values. Usually tests shouldn't take any inputs... (the test runner doesn't know what inputs to supply the test with).


[Test(Description="PlainVanilla")]
public void TestBasicMath()
{
Assert.AreEqual(1, 1, "Someone broke mathematics");
}

But then there are always 'exceptions to the rule'. For example I'm writing a class called the Tokenizer that reads tokens from an input string. So I give it "10 + 15", the first token returned by the class should be the number 10.
Now I need to exercise the test code block below with different inputs for sInput like "10", " 10 + 15"


Tokenizer t = new Tokenizer(sInput);
Assert.AreEqual( 2, t.GetNextToken() );

Now back in the old days, you'd need to write a test case for each possible data value. Now with Andreas Schlapsi's RowTest Extension which is bundled with NUnit, things are much simpler.
Prerequisites:

Needs NUnit 2.4.7 or later. I 'm using NUnit 2.4.8 for .Net 2.0. Get it here as always
Add a reference to the nunit.framework.extensions assembly (in addition to the usual nunit.framework to your test project


using NUnit.Framework;
using NUnit.Framework.Extensions;

namespace TestDynCalc
{
[TestFixture]
public class TestTokenizer
{

  [RowTest]
  [Row("10 + 15")]
  [Row("10")]
  [Row(" 10 +15", TestName = "WhiteSpaceBeforeFirstNumber")]
  [Row("+10+15", Description = "Read number from +10+15", ExceptionMessage = "No number found at beginning of input stream!", ExpectedException = typeof(ArgumentException))]
  public void ReadNumberFromInputString(string sInput)
  {
      Tokenizer t = new Tokenizer(sInput);
      Assert.AreEqual( 2, t.GetNextToken() );
  }

  [Test(Description="PlainVanilla")]
  public void TestBasicMath()
  {
      Assert.AreEqual(1, 1, "Someone broke mathematics");
  }
}
}

Whoa! Let me step through all that. The using directives are self explanatory.

The RowTest attribute (instead of Test) over a NUnit test method allows you to parameterize the test method and gets the ball rolling.
Next for every unique set of inputs, you need to run this test with, you add a Row attribute and specify the inputs as constructor arguments. (The extension is vocal about any mismatch between number of test method parameters and the number of inputs you supply. )
The Row Test also has some named parameters

TestName : Lets you specify a descriptive name for the specific 'sub-test'. See how the last child node of the RowTest has a different name in the the attached GUI Screenshot below.
Description: This seems to be broken. It's a NUnit feature.. allowing you to tag a test with some comments that will show up when you bring up Properties for the test case. (Right click > Properties)
ExpectedException, ExceptionMessage: Ideally I'd like this as a different test case. However you have the option to mark a set-of-inputs to indicate that 'this should cause this type of Exception with the following message'. See last Row attribute.

This is how the NUnit GUI renders a RowTest. Quite nice. (Of course, You should choose better names for your tests :) Each Row attribute is rendered as sub-node of the test with the relevant name and all the input params specified in brackets (comma seperated in case of multiple params).

Update

Moving from RowTest to TestCase: There are no big changes with using the new 2.5 TestCase attribute. You don't need an explicit RowTest attribute. Replace each Row attribute with a TestCase attribute.
The Exception properties have been renamed (inline with the 2.5 ExpectedException revamp. So check the docs on ExpectedException if you can't get something to work).. but easy to figure out via Intellisense.
Another improvement is an explicit Result property, which as you can guess would be used to verify the output of your test case. Before TestCase, you had to pass in another parameter in the RowTest named expectedOutput and take care to use it only for Asserting at the end of the test.


       [TestCase("10 + 15")]
       [TestCase("10")]
       [TestCase(" 10 +15", TestName = "WhiteSpaceBeforeFirstNumber")]
       [TestCase("+10+15", Description = "Read number from +10+15", ExpectedMessage = "No number found at beginning of input stream!", ExpectedException = typeof(ArgumentException))]
       public void ReadNumberFromInputString(string sInput)
       {
           Tokenizer t = new Tokenizer(sInput);
           Assert.AreEqual(2, t.GetNextToken());
       }

Non-obvious XPath: Getting elements that do not have a particular attribute or child element

Well sounds pretty simple.. Here's an example. Consider the following source XML document


<?xml version="1.0"?>
<Wizards>
<Wizard Name="Merlin" Level="5">
 <Spells>
   <Spell Name="SkinOfOil"/>
 </Spells>
</Wizard>
<Wizard Name="Apprentice">
 <Spells/>
</Wizard>
</Wizards>

So now to select all Wizards that do not have a Level attribute...


/Wizards/Wizard[not(@Level)]

Similarly to select all Wizards that do not know any Spells...


/Wizards/Wizard[not(Spells/Spell)]

The key here is the not function which reduces the argument to a boolean value. Once again XPath amazes me with its simplicity

Getting NUnit to go all STA

As I was trying to help out someone who had trouble with writing a unit test for a data bound WPF Window.. first of all I had to ref WindowsBase, PresentationCore and PresentationFramework and then I ran into a curt warning from WPF when you're trying to instantiate a WPF window in a NUnit test case.. (Grumble Grumble... I hate UI in unit tests)
TestProj.TestBindings.TestTextBoxBinding:System.InvalidOperationException : The calling thread must be STA, because many UI components require this.
What is STA? Something you only wish 'stays out of the way'.
Interesting.. this means... its not running in STA. Elementary. Now how do I get NUnit to go STA. It was a long arduous road.
First up if you have NUnit 2.5 or above, I believe its as easy as this.


[Test, RequiresMTA]
public void MTA()
{
   Assert.AreEqual(ApartmentState.MTA, Thread.CurrentThread.ApartmentState);
}

[Test, RequiresSTA]
public void STA()
{
   Assert.AreEqual(ApartmentState.STA, Thread.CurrentThread.ApartmentState);
}

If not, which is where I was. It's slightly more complicated.
First go to

C:\Program Files\NUnit 2.4.8\bin

Look for a file called NUnitTests.config. Next copy this over to your test dll folder which has TestProj.nunit (or make one.. you'll need it). Rename the copy of the file to 'TestProj.config'.. Open it up in an editor.. Time for adding bringing some more XML into the world.


<configuration>
  <configSections>
    <sectionGroup name="NUnit">
      <section name="TestRunner" type="System.Configuration.NameValueSectionHandler"/>
    </sectionGroup>
  </configSections>

  <NUnit>
    <TestRunner>
      <add key="ApartmentState" value="STA"/>
    </TestRunner>
  </NUnit>
  <!-- all the other fluff stays -->
</configuration>

That's it, fire up NUnit.. if all goes well, you'll be running in STA and WPF will 'show itself'.

Graphs in Ruby On Rails

Well I was building my personal expense tracking web app and soon enough I need to graph the top expense categories... A little searching led to gruff and it worked right outta the box.

First up, Gruff needs RMagick which needs ImageMagick. Tricky.. so go here and I hit Q#7
Soon you should land up here download the rmagick-win32 gem zip. This archive has both RMagick gem and the install for the right version of ImageMagick. Installing the latest version of ImageMagick may not work.. (I didnt try.. the docs warned me off).
So install ImageMagick first. Next install the gem with 'gem install rmagick --local' in the folder where you unzipped the archive.
Now switch to your Rails Dev Environment. Open up config/environment.rb - Add this line at the end 'require gruff'
Add a new action to the controller of your choice.

  def showTopN
g = Gruff::Bar.new
g.title = "Hello Gruff"
g.theme_37signals

Expense.get_top_n_categories(:limit=>10).each{ |category|
g.data(category.name, category.total_amount.to_f)
}

send_data g.to_blob, :disposition=>'inline', :type=>'image/png', :filename=>'top_n.pdf'
end

Now close all command windows. Restart Webrick (your Rails Web Server) to take into account changes made to your PATH env var.
Fire up a browser.. in my case to http://localhost:3000/outflow/showTopN

Update:Normal people would be overjoyed at this point.. but not me. I wanted to create this graph at runtime .. So when I click on a link that I imagine will be there, I callback on my controller (yes using AJAX), create my graph and then slot it into a div tag on the same page. This will help me chart different sub-sets of data without leaving the page.
But how? so I post a question on stackoverflow
Solution:
report_controller.rb

require 'gruff'
class ReportController < ApplicationController
 def showTopNCategories

 end

 def drawBarChart
g = Gruff::Bar.new
     g.title = "Hello Gruff"
     g.theme_37signals
  
     Expense.get_top_n_categories(:limit=>10).each{ |category|
         g.data(category.name, category.total_amount.to_f)
     }
     g.write('public/images/top_n.png')
     render(:text=>'top_n.png')
 end
end

showTopNCategories.rhtml


<%content_for("page_scripts") do -%>
function updateImg(id, imageFile)
{
 $(id).innerHTML = '<img src="/images/' + imageFile + '"/>';
}
<% end -%>

<!-- some controls -->
<%= link_to_remote("Refresh Chart",
 :complete => "updateImg('div_barchart', request.responseText)",
 :url=>{:action=>'drawBarChart'})%>

<div id="div_barchart">

</div>

application.rhtml


<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<!-- the usual -->

<script type="text/javascript"> <%= @content_for_page_scripts %> </script>
</head>
<!-- rest of the usual -->

The quirk here was that

gruff considers the Rails Project folder as the base folder.. so you need public/images/FileName.png as the parameter to g.write. Rails views however have public as the base folder. So image paths are relative.. /images/FileName.png
the javascript function didn't wire up correctly. It dies silently if you make a typo for instance. Use alert("Hit!"); liberally to know if the function is being hit.

Optional parameters in Ruby

Well I keep forgetting this. There's a way to pass in a hash of optional parameters as the last parameter to a method.. how do I do that? I'll save you some time. Here's how.

  def Expense.get_top_n_categories options={}
sQuery = "SELECT categories.name, sum(amount) as total_amount
  from expenses
  join categories on category_id = categories.id
  group by category_id
  order by total_amount desc";
sQuery += " limit #{options[:limit].to_i}" if !options[:limit].nil?
Expense.find_by_sql(sQuery)
end

# Now you see a param.. top 10  
obCategories = Expense.get_top_n_categories :limit => 10

# now you don't
obCategories = Expense.get_top_n_categories

The cleverness lies in defaulting the options param to an empty hash. Now you can pass any number of params using the param => value notation.. everything comes neatly packaged to you as an options hash.

How did you manage to get the code formatted so pretty? http://blog.wolfman.com/articles/2006/05/26/howto-format-ruby-code-for-blogs

ShowMeTheMoney-16 Let's play tag the expense with a category

Well I know its been a long time... I took a break.. I got married in it. Long story short... its been a while. But now let's end what I started
Basically we want to tag each expense with some text. Not much time would be spent on the updating and deleting categories. If we can just create categories and select existing ones, we should be good. We’ll need to get some confirmation from the user on this.. So we’ll defer the delete for sometime.
So basically a dropdown with existing categories – that is editable. So while entering the expense entry, the user can either choose an existing category or add a new one.

I wrote up an acceptance test. Although the former is more readable (being a DoFixture.. however I find that FitLibrary isn't ported yet for Ruby) So we have to make do with the corresponding ActionFixture.. more verbose but will do.

First lets add a migration to add the categories table and add a foreign key in the expenses table to refer to the former.
> ruby script/generate migration add_categories
This will create the corresponding file in db\migrate folder.

class AddCategories < ActiveRecord::Migration
def self.up
create_table :categories do |table|
table.column :name, :string, :limit => 80
end
execute "INSERT INTO categories VALUES (1, 'General');"
add_column :expenses, :category_id, :integer, {:null=>false, :default =>1}
execute "alter table expenses add constraint fk_expenses_to_categories foreign key(category_id) references categories(id)"

end

def self.down

execute "ALTER TABLE expenses DROP FOREIGN KEY fk_expenses_to_categories"
remove_column :expenses, :category_id
drop_table :categories

end
end

Now to run this migration against all three databases dev, prod and test
> rake db:migrate RAILS_ENV="test"

Add the category model with the unit tests for that one. We focus only on creation and retrieval .. don’t see any edits or deletes happening for categories

class CategoryTest < Test::Unit::TestCase
def test_create …
def test_saveWithoutCategoryName_fails ….
end

Now onto expense_test.rb
We need to

create the expense category if it does not exist
use the existing category if possible
ensure that the associated category is retrieved from the plot

the first two should be the responsibility of the outflow_controller
create categories.yml first. update expense model tests one at a time to account for the category.
Update OutflowController to create or use-existing category on creation of expense entry.

class OutflowController < ApplicationController
...

auto_complete_for :category, :name

def create
raise ArgumentError, "Category missing!", caller[1..-1]  if (params[:category].nil? || params[:category][:name].nil?)  

sCategoryName = (params[:category][:name]).strip
category = Category.find(:first, :conditions=>["name = ?", sCategoryName])
category = Category.create( {:name => sCategoryName} ) if (category.nil?)
params[:expense][:category_id] = category.id

@expense = Expense.new(params[:expense])
if (@expense.save)
 flash[:notice] = 'Expense entry saved'
 redirect_to :action=>'list'
else
 render :action=>'new'
end
end
end

Update views to have an auto-completing field for the user to enter the category for an expense. This is a one-line addition in the controller as shown below coupled with the following in the view.
/app/views/outflow/_form.rhtml


<tr>
<td> <%= getAppString(:label_category) %> </td>
<td>
<%= text_field_with_auto_complete :category, :name%>
</td>
</tr>

Right on! Let’s complete the implementation required for the second part of our acceptance test.

class ExpenseGetTopNCategoriesTest < Test::Unit::TestCase
fixtures :expenses, :categories

def test_getTopN_categories
 obCategories = Expense.get_top_n_categories

 assert_equal 3, obCategories.length
 assert_equal categories(:rent_category).name, obCategories[0].name
 assert_in_delta expenses(:rent_expense_entry).amount, obCategories[0].total_amount, 0.01

 assert_equal categories(:apparel_category).name, obCategories[1].name
 assert_in_delta expenses(:apparel_expense_entry).amount, obCategories[1].total_amount, 0.01

 assert_equal categories(:entertainment_category).name, obCategories[2].name
 assert_in_delta expenses(:movies_expense_entry).amount + expenses(:another_movie_expense_entry).amount,
     obCategories[2].total_amount, 0.01
end

...

run ruby script/server –s
a little playing around helps me settle on this query

Expense.find_by_sql("SELECT categories.name, sum(amount) as total_amount
from expenses
join categories on category_id = categories.id
group by category_id
order by total_amount desc")

Stackoverflow users help me to convert this into Rails Model speak

Expense.find(:all, :select =>

"categories.name name, sum(amount)
total_amount",
     :joins => "inner join categories on category_id =
categories.id",
     :group => "category_id",
     :order => "total_amount desc"

That turns out fine. I add more tests and a couple of more tables to the acceptance test to retrieve top n categories, top categories in a date range, top n categories in a date range. Soon enough...

Check in everything. Repeat the same for tagging inflows with categories. Should be easy. Until next time...