Test driving an infix to postfix expression converter (2 of 3)

So at the end of Post 1 of this series. we've a converter that doesn't yet understand operator precedence.
Let me refresh my memory before proceeding..
An operator token belongs to a class/family of 'Operators', has a string representation like *, +, /, - and finally has a relative precedence associated with it.
1 for +, -
2 for *, /
3 for (, )

Seems like we need an Operator class instead of just a string. I'll mark the failing test with [Ignore("Detour")] tag while we go make some operators.
Create a new test fixture, imagine a magical Operator class.. something like this


TestOperator.cs

[Test]
public void TestParse_PlusOperator()
{
Operator plusOp = Operator.Parse("+");
Assert.IsNotNull(plusOp);
Assert.AreEqual("+", plusOp.ToString());
Assert.AreEqual(1, plusOp.PrecedenceLevel);
}


I turn my ideas about Operators into a Type/Class.

public class Operator
{
private string m_OperatorSymbol;
private int m_iPrecedenceLevel;
private static readonly Operator plusOperator = new Operator("+", 1);

private Operator(string sSymbol, int iPrecedenceLevel)
{
m_OperatorSymbol = sSymbol;
m_iPrecedenceLevel = iPrecedenceLevel;
}

public int PrecedenceLevel
{
get { return m_iPrecedenceLevel; }
}
public override string ToString()
{ return m_OperatorSymbol; }


public static Operator Parse(string stringToParse)
{ return Operator.plusOperator; }
}



Green! I took the quick way out with the Parse implementation.. I'll let the coming tests force me to distill that bit of code. I add similar tests for Minus (and then Multiply and Divide). I end up with a need for a 'switch case' in Parse().
Also Parse should probably throw if its passed a token that it doesnt recognize. I'll note that down as a test too.


[Test]
[ExpectedException(ExceptionType=typeof(ArgumentException), ExpectedMessage="Unknown Operator! Cannot parse Gishu")]
public void TestParse_InvalidInput()
{
Operator noOp = Operator.Parse("Gishu");
Assert.Fail("Operator learnt some AI to parse Gishu. This should have blown up by now!");
}


[Furious typing] Alright now Operator.Parse() looks like this.. all its tests pass.

public static Operator Parse(string stringToParse)
{
switch (stringToParse)
{
case "+":
return Operator.plusOperator;
case "-":
return Operator.minusOperator;
case "*":
return Operator.multiplyOperator;
case "/":
return Operator.divisionOperator;

default:
throw new ArgumentException("Unknown Operator! Cannot parse " + stringToParse);
}
}


Now lets update our tokenizer to use this new type. This looks like it might break a few tests.. but with baby steps and a vigilant eye, it'll all be better by the time I'm done with it. I'll start by modifying one of my initial tokenizer tests...

[RowTest]
[Row("+")]
[Row(" + ")]
public void ReadPlusOperator(string sInput)
{
Tokenizer t = new Tokenizer(sInput);

//Assert.AreEqual("+", t.GetNextToken());
Assert.AreEqual(Operator.Plus, t.GetNextToken());
}


This would require us to make the static plusOperator member public. The other option is to check each member of the operator object… which would be duplication of the testfixture for Operator. So lets share our operators with the world.. make em public and rename.

public class Operator
{
public static readonly Operator Plus = new Operator("+", 1);
//...

And we're red...
TestDynCalc.TestTokenizer.ReadPlusOperator( + ):
Expected: <+>
But was: "+"


Instead of making a big bang change.. let’s make an ugly hack inside Tokenizer#GetNextToken() to make just this test pass..

Regex regexOperator = new Regex(@"^\s*([-+*/()])");
obResults = regexOperator.Match(m_sInputData);
if (obResults.Success)
{
Group obGroup = obResults.Groups[1];
RemoveReadCharacters(obGroup);

//return obGroup.Value;
if (obGroup.Value != "+")
return obGroup.Value;
else
return Operator.Plus;

}

I broke 2 tests - that tested the tokenizer with complete expressions containining +. Turns out its easy to fix them.

[Test]
public void ReadMultipleTokens()
{
Tokenizer t = new Tokenizer(" 2 + 3 - 34 * 2 / 1 ");
//object[] arrExpectedTokens = new object[] { 2, "+", 3, "-", 34, "*", 2, "/", 1 };
object[] arrExpectedTokens = new object[] { 2, Operator.Plus, 3, "-", 34, "*", 2, "/", 1 };
foreach (object token in arrExpectedTokens)
{
Assert.AreEqual(token, t.GetNextToken());
}
}

All green. Excuse me while I repeat the procedure for all the other operators. The hack in Tokenizer#GetNextToken() can now be replaced by ...

return Operator.Parse(obGroup.Value);


However we forgot the parantheses bros. while creating our operators. We need more operators! Mark Tokenizer with Ignore(“2 TDDed operators coming right up…”).. [more furious typing] we have new operators. Take off the ignore tag. Similar changes… and Tokenizer is now Operator-aware.
Time to take off the ignore tag on ExpressionConverter tests and end our little detour.
So we're back to red.. but this time, we've operators with precedence. [after some mental acrobatics with a pencil and an eraser on paper]
The key to getting this is to check if the operator on top of stack is not a '(' and has higher precedence than the incoming/current token. If yes, pop the stack, write popped operator to output and retry. If no, push the incoming token onto the stack.
So + should not be pushed if the top of stack[TOS] contains *; however * should be if TOS is +.
So we add some more code to the final else block in ExpressionConverter#ConvertToPostfix..


if (obStack.Count() > 0)
{
Operator termOnTopOfStack = obStack.Peek() as Operator;
if ((termOnTopOfStack != Operator.OpenParantheses)
&& (termOnTopOfStack.PrecedenceLevel > (term as Operator).PrecedenceLevel))
{
postfixExpr.Add(obStack.Pop());
continue;
}
}

obStack.Push(term);


All Green! Woohoo! However the method seems to have grown to an uncomfortable size for me... maybe I can refactor.

  • Rename method to ExpressionConverter#ConvertInfixToPostfix()

  • Stack always contains Operators.. So we can do away the casting...Stack stackOfOperators

  • Replace ‘term.ToString() == ")" ‘ with type-based checks

  • Each of ( and ) is a parenthesis together they make a set of parentheses. http://en.wikipedia.org/wiki/Parenthesis... well you live, you read wikis, you learn. Correct typos

  • Extract some methods to clarify intent.
    Had to promote the stack and postfixExpr variables to class members. (All the extracted methods required the stack and List variables to be passed in)



Looking good. I'll post the refactored version shortly. Let's try another expression/test.


[Row("0-12000-(16*4)-20", "0 12000 - 16 4 * - 20 -")]


Red again! [A few brain cycles later]
This breaks because the converter doesn’t recognize Left to Right precedence for operators at the same level. (In 0 - 5 + 3, - should be performed first to get the result of -2 even though precedence-wise + and - are equivalent.)
But this can be easily solved by replacing > with >= where we compare operator precedence. Here’s the final version of the class.


public class ExpressionConverter
{
List<object> m_postfixExpr = new List<object>();
Stack<Operator> m_stackOfOperators = new Stack<Operator>();
public List<Object> ConvertInfixToPostfix(string sInfixExpr)
{
m_postfixExpr.Clear(); m_stackOfOperators.Clear();

Tokenizer parser = new Tokenizer(sInfixExpr);
object term = parser.GetNextToken();
while (term != null)
{
if (term is int)
{
m_postfixExpr.Add(term);
}
else if (term == Operator.CloseParenthesis)
{
MoveOperatorsOnStackToPostfixExprTillMatchingOpenParenthesisIsFound();
}
else
{
Operator operatorTerm = term as Operator;
if (OperatorOnTopOfStackHasHigherOrSamePrecedenceAs(operatorTerm))
{
m_postfixExpr.Add(m_stackOfOperators.Pop());
continue;
}

m_stackOfOperators.Push(operatorTerm);
}

term = parser.GetNextToken();
}

PopAllRemainingOperatorsOnStackToPostfixExpr();
return m_postfixExpr;
}

private bool OperatorOnTopOfStackHasHigherOrSamePrecedenceAs(Operator term)
{
if (m_stackOfOperators.Count() == 0)
return false;

Operator termOnTopOfStack = m_stackOfOperators.Peek();
return ((termOnTopOfStack != Operator.OpenParenthesis)
&& (termOnTopOfStack.PrecedenceLevel >= term.PrecedenceLevel));
}

private void MoveOperatorsOnStackToPostfixExprTillMatchingOpenParenthesisIsFound()
{
Operator term = m_stackOfOperators.Pop();
while (term != Operator.OpenParenthesis)
{
m_postfixExpr.Add(term);
term = m_stackOfOperators.Pop();
}
}

private void PopAllRemainingOperatorsOnStackToPostfixExpr()
{
while (m_stackOfOperators.Count > 0)
m_postfixExpr.Add(m_stackOfOperators.Pop());
}
}


Code is good. End of post. In the next post, I'll write a console 'main' app to exercise the converter and evaluate the expression.

2 comments:

  1. hey thanks .. i deduced your code (there's some missing from the post) converted it to java, added handling of negative numbers ( ie "30 / (-4 + 1) * 3" ) and am using this in a production app.

    ReplyDelete