Test driving an infix to postfix expression converter (1 of 2)

I've been wanting to understand infix to postfix conversion - strangely intriguing to me. Here's what I'm about to ramble on for a while today...
Given an expression in infix form (operators appear inline in the expression) like
3 * (2 + 5)
Transform it to its postfix equivalent
3 2 5 + *


This form can be easily evaluated by a computer. You keep pushing terms onto a stack.. everytime you hit a operator, you pop the last 2 values and perform the operation, push the result back on the stack. In time, you will run out of terms and have the result at the top of the stack.
e.g. Stack builds from left to right (TOS is extreme right)
Read 3 Stack [3]
Read 2 Stack [3 2]
Read 5 Stack [3 2 5]
Read + Stack [3 7] #Pop 5,2 Perform 2+5, Store 7 back on stack
Read * Stack [21] # Pop 7,3 Perform 3*7, Store 21 back on stack.


Time and again I've procrastinated.. Then I hit Bart De Smet's blog post dated Oct 2006 - It's time! :). You may want to read up on the algorithm to parse an infix expression.

Just to make things interesting I'm going to do it TDD style.. red-green-refactor to glory. I'm trying to shorten the post to the essence.

I've TDDed myself a Tokenizer, a class that parses a string, uses a couple of regular expressions and returns the next number/operator with each call to GetNextToken().

Tokenizer.cs


public class Tokenizer
{
string m_sInputData;
public Tokenizer(string sInput)
{ m_sInputData = sInput; }

public object GetNextToken()
{
Regex regexNumber = new Regex(@"^\s*(\d+)");
Match obResults = regexNumber.Match(m_sInputData);
if (obResults.Success)
{
Group obGroup = obResults.Groups[1];
RemoveReadCharacters(obGroup);
return Int32.Parse(obGroup.Value);
}

Regex regexOperator = new Regex(@"^\s*([-+*/()])");
obResults = regexOperator.Match(m_sInputData);
if (obResults.Success)
{
Group obGroup = obResults.Groups[1];
RemoveReadCharacters(obGroup);
returnobGroup.Value;
}

return null;
}

private void RemoveReadCharacters(Group obMatchedGroup)
{
m_sInputData = m_sInputData.Remove(0, obMatchedGroup.Index + obMatchedGroup.Length);
}
}


The tokenizer has around 20 tests. I'll post one of the latter tests to illustrate usage

TestTokenizer.cs

[Test]
public void ReadMultipleTokensWithParantheses()
{
Tokenizer t = new Tokenizer(" ( 1 + 2 ) * (3 - (8/2))");
object[] arrExpectedTokens = new object[] { "(", 1, "+", 2, ")", "*", "(", 3, "-", "(", 8, "/", 2, ")", ")" };
foreach (object token in arrExpectedTokens)
{
Assert.AreEqual(token, t.GetNextToken());
}
}


All set to start with the meat of the problem: ExpressionConverter. So we'll come up with out first test employing some wishful thinking. The output is going to be reordered sequence of numbers and operators.. hence List<objects>

TestExpressionConverter.cs

[Test]
public void TestConvert_Simple()
{
ExpressionConverter converter = new ExpressionConverter();
List<object> postfixExpr = converter.ConvertToPostfix(" 2 + 3 ");
StringBuilder sb = new StringBuilder();
foreach(object term in postfixExpr)
{
sb.AppendFormat("{0} ", term.ToString());
}
Assert.AreEqual("2 3 +", sb.ToString());
}


To make this compile, I need to define the ExpressionConverter class and the ConvertToPostfix method within it. Something like this

public List<Object> ConvertToPostfix(string sInfixExpr)
{
List<object> postfixExpr = new List<object>();
Stack<object> obStack = new Stack<object>();

Tokenizer parser = new Tokenizer(sInfixExpr);
object term = parser.GetNextToken();
while (term != null)
{
if (term is int)
{ postfixExpr.Add(term); }
else
{ obStack.Push(term); }

term = parser.GetNextToken();
}

while (obStack.Count > 0)
postfixExpr.Add(obStack.Pop());

return postfixExpr;
}


The simplest thing that works. Collect all numeric terms in the result List<>, push operators onto the stack. At the end, pop contents of the stack into the List. Green!
Let's turn on the heat, see how it deals with grouping. Since the test is basically the same, albeit with diff input and hence output, I found an unorthodox way to use the RowTest NUnit extension I blogged about earlier. I make a couple of changes to the test like this... The last parameter in the list is the ExpectedValue for the test. So now I can run the same test with different input and expected values by adding Row Attributes. Might be a bit hard to read for the new guy.. but not too bad.



[RowTest]
[Row(" 2 + 3 ", "2 3 +")]
[Row(" 2 + (30 + 50 ) ", "2 30 50 + +")]
public void TestConvertInfixToPostfix(string sInfixExpr, string sExpectedPostfixExpr)
{
ExpressionConverter converter = new ExpressionConverter();
List<object> postfixExpr = converter.ConvertInfixToPostfix(sInfixExpr);

StringBuilder sb = new StringBuilder();
foreach(object term in postfixExpr)
{
sb.AppendFormat("{0} ", term.ToString());
}
Assert.AreEqual(sExpectedPostfixExpr, sb.ToString().Trim());
}


This fails as expected. Our code doesn't do parentheses yet. When we read a ), we need to pop out all operators till the matching (. Both ( ) are discarded and never part of the postfix expression. We can make this change in ConvertToPostfix()

if (term is int)
{
postfixExpr.Add(term);
}

else if (term.ToString() == ")")
{
object poppedTerm = obStack.Pop();
while (poppedTerm.ToString() != "(")
{
postfixExpr.Add(poppedTerm);
poppedTerm = obStack.Pop();
}
}

else
{
obStack.Push(term);
}


and back to Green. Lets take it up a notch. Add a new Row attribute / new test with the following expression

[Row(" ( (10+20) + 30 ) * 20-8/4 ", "10 20 + 30 + 20 * 8 4 / -")]


The test fails. NUnit reports:
TestDynCalc.TestExpressionConverter.TestConvertToPostfix( ( (10+20) + 30 ) * 20-8/4 , 10 20 + 30 + 20 * 8 4 / -):
String lengths are both 25. Strings differ at index 16.
Expected: "10 20 + 30 + 20 * 8 4 / -"
But was: "10 20 + 30 + 20 8 4 / - *"
---------------------------^


Bitten by operator precedence.. * has precedence over -. Is this the end for our converter? Not quite.. Onto Part 2


No comments:

Post a Comment