NMock 3.5

September 23rd, 2007

One of the places where I am likely to actively use expression trees is testing/mocking.
(I am not the first one to notice this — I saw some post on expression tree asserts, but I lost it).

In my day-to-day development I am using NMock2.
Ayende‘s Rhino Mocks may be more convenient, but I do not like the syntax.
I just can’t read “expect call return” — my mind wants “expect call returns”.
Also, event support in Rhinos is really cryptic.
(I understand that it may be the only stringless way to express it).

So working on C# 3.0 project, I’ve looked into a stringless experience with NMock2.
What I wanted was to pass a method call expression, to change this:

Expect.Once.On(fs)
    .Method("GetDescendantDirectories")
    .With(RootPath)
    .Will(Return.Value(directories));

into this

Expect.Once
    .That(() => fs.GetDescendantDirectories(RootPath))
    .Will(Return.Value(directories));

Fortunately, it was quite easy to do with extension methods.
I have spent more time figuring better fluent syntax than actually writing it.

But the case where expression trees really shine, I think, is in the possibility to do this:

Expect.Exactly(directory.Files.Count)
    .That(() => access.IsPublicFile(Any.String))
    .Will(Return.Value(true));

I can analyze expression tree and find out that Any.String is not “a string”, but “any string”.
It is also easy to imagine Any.String.Matching(“^whatever”) and so on.
I have not tried to implement it, but I feel that it would be simple as well.

Also, while writing the post, I just got an idea — in the world of extenson methods we can do

Expect.Exactly(directory.Files.Count)
    .That(() => access.IsPublicFile(Any.String))
    .Will.Return(true);

without sacrificing extensibility.
“Will” can also be chained if more than one action is needed (however, a delegate would be better for this).

It would be nice to actually add this to NMock2 (as 3.5 branch), but they are using CVS, not SVN.
Also, am I the only one who thinks that SourceForge is slow and has hardly usable (bloated) design?

So if you want the code, you can ask me in comments, but it is primitive indeed.

Some days ago I wrote an overview of typeswitch problem.
Now it’s time to give some solutions.

Overview

Let’s imagine a document inheritance hierarchy:

XsltDocument : XmlDocument : Document
TextDocument : Document

Let’s assume I want to get strings “Xml”, “Xslt”, “Not Xml and not Xslt” based on the document runtime type.
This is primitive indeed, but it does demonstrate a concept.

I call the most useful soultion fluent switch:

string result = Switch.Type(document).
        Case(
            (XsltDocument d) => "Xslt"
        ).
        Case(
            (XmlDocument d) => "Xml"
        ).
        Otherwise(
            d => "Not Xml and not Xslt"
        ).
        Result;

It does contain a lot of visual clutter, but it scales quite well in comparison to if/return approach.
The code behind is simple — each Case checks type and returns itself.
Case is a generic method whose parameters are inferred from the lambda.

For this task, case bodies do not depend on actual object contents.
So they can be expressed cleaner:

string result = Switch.Type(document).To<string>().
        Case<XsltDocument>("Xslt").
        Case<XmlDocument>("Xml").
        Otherwise("Not Xml and not Xslt").
        Result;

This time I have to specify the result type explicitly.

The fluent switch syntax is quite powerful — you can even add cases dynamically.
This is a nice difference from conventional language constructs.

Alternatives

I have tried a number of alternatives, but no one of them did better.

  1. Many overloads switch
    string result = Switch.Type(
        document, 
            (XsltDocument d) => "Xslt",
            (XmlDocument d) => "Xml",
            d => "Not Xml and not Xslt"
    );

    This syntax is quite concise and understandable.
    But it requires an additional overload for each additional case.
    So it is quite impractical.

  2. Object initializer switch
    string result = new TypeSwitch<Document, string>(document) {
        (XsltDocument d) => "Xslt",
        (XmlDocument d) => "Xml",
        d => "Not Xml and not Xslt"
    };

    Also more concise than my original solution, but much more cryptic.
    Constructor and generic parameters also add a degree of confusion.

    The most interesting thing about this syntax was that it actually worked.
    It seems object initializers have some nice fluent power.

Compilation

After running some benchmarks, I found that fluent switch is about 200 times slower than hardcoded ifs.
It may be perfectly acceptable, of course.
However, I have found a way to precompile the switch using expression trees.

From the usage perspective, precompiled switch is just a Func<T, TResult> (it does not support Actions right now).
So you can cache it in

private static readonly Func<Document, string> CompiledFluentLambdaSwitch = Switch.Type<Document>().To<string>().
    Case(
        (XsltDocument d) => "Xslt"
    ).
    Case(
        (XmlDocument d) => "Xml"
    ).
    Otherwise(
        d => "Not Xml and not Xslt"
    ).
    Compile();

which is extremely similar to the first code sample.
The differences are that you do not specify what you are switching on (it would be a function parameter).
But you do explicitly specify from/to types.

The compilation process was fun to write, since it was the first time I dug into expressions trees.
Statements are not supported in trees, so I had to use embedded ConditionalExpressions for cases.
The resulting tree is something like

d => (d is XsltDocument)
             ? ((cast => "Xslt")(d as XsltDocument)) 
             : ((d is XmlDocument) 
                   ? ((...

I have not found a way to cache cast and null-check it, so I cast/typecheck it two times.

Benchmarks

The best thing about compilation is performance:

Benchmark: 1000000 iterations, two switch calls per iteration.

Benchmark overhead:            40.1ms   30.0ms   30.0ms   30.0ms |    32.5ms
Direct cast:                   80.1ms   60.1ms   80.1ms   60.1ms |    70.1ms
Fluent switch
    on lambdas:              1512.2ms 1502.2ms 1602.3ms 1482.1ms |  1524.7ms
    on lambdas (compiled):     90.1ms  110.2ms   80.1ms   80.1ms |    90.1ms
    on constants:            1281.8ms 1271.8ms 1311.9ms 1271.8ms |  1284.3ms
    on constants (compiled):   80.1ms   90.1ms   90.1ms   70.1ms |    82.6ms
Many overloads switch:        440.6ms  390.6ms  430.6ms  420.6ms |   420.6ms
Object initializer switch:    751.1ms  681.0ms  741.1ms  751.1ms |   731.1ms

As you can see, precompiled switch is nearly as performant as hardcoded one (direct cast).
I am quite impressed by simplicity/power ratio of the expression trees.

Code

I uploaded AshMind.Constructs to Google Code.
I see it as a learning/research project, but you can put it to any practical use.

This is the first post in a two-post series on a typeswitch implementation in C#.
This one contains a problem statement and possible solutions in other languages.
The second one will contain a Switch.Type description and benchmarks.

Somewhat often I find myself writing code to do something based on runtime type of a value.
A classic case is to filter a tree with different types of nodes (expression tree, for example).

The code often looks like

if (x is A) {
    DoWithA(x as A);
}
else if (x is B) {
    DoWithB(x as B);
}
else
    // ...

Or, if you are a heavy performance freak like me it is like

A a = x as A;
if (a != null) {
    DoWithA(a);
    return;
}

B b = x as B;
if (b != null) {
    DoWithB(b);
    return;
}

// ...

After second type it really starts to smell.

I could have used a Visitor.
But I really dislike it due to the coupling between the Visitor interface and the underlying class hierarchy.
Also requires me to extend hierarchy with a zero value Accept method.

I can also use some kind of hashtable-based smart resolver, but it would be complex and slow.

Actually, that is not an obscure problem and a lot of other languages have their solutions.
There are two common ones:

  1. OO concept known as multiple dispatch.
    Multiple dispatch is just a bunch of “method overloads” resolved by runtime environment basing on the runtime argument types.
    This is quite different from ordinary method overloading — for example, in C# compiler picks an overloaded method during compilation.

    Actually, .Net has a way to do multiple dispatch through Reflection (Type.InvokeMethod), but it quite slow and not compiler-type-safe.

    There is a brilliant paper “Generalized Interfaces for Java” that gives some insight on useful multiple dispatch in Java/C#-like languages.
    Hopefully we’ll get that functionality in C# and CLR sooner or later.

  2. Functional language concept known as pattern matching.
    This is a kind of powerful switch/case statement (with a simplified syntax).
    I do not actually know much about functional languages, so that is my understanding.

The simplest possible construct (I do not want to dive into multiple dispatch) might look like this:

typeswitch (x) { 
   case (A a): DoWithA(a); break; 
   case (B b): DoWithB(b); break; 
   default: throw new ArgumentException(); 
}

And lcs Research C# Compiler has a similar syntax sample:

typeswitch (o) { 
    case Int32 (x): Console.WriteLine(x); break; 
    case Symbol (s): Symbols.Add(s); break; 
    case Segment: popSegment(); break; 
    default: throw new ArgumentException(); 
}

Cω language also had an actual typeswitch construct.
But I was not able to find out it’s syntax (my old VS.Net is somewhy ruined and web is silent on it).
Anyway, Andrey Titov (who I hope will also blog someday) reminded me that it compiled to zero IL (seems it was too experimental).

Stay tuned, next time we’ll see how it is possible to emulate typeswitch in C# 3.0.

Everything in this article is tested with Visual Studio 2008 Beta 2 and may become obsolete.

While a lot of people blog about expression trees in new C#, I haven’t seen any post about things you can not do with them. Maybe it is common knowledge, but since I stumbled in it myself, I’ll share.

Basically, C# specification says:

Not all anonymous functions can be represented as expression trees. For instance, anonymous functions with statement bodies, and anonymous functions containing assignment expressions cannot be represented. In these cases, a conversion still exists, but will fail at compile time.

So, that’s what you can not do according to the specification:

Expression<…> y = x => { DoAnything() };
// error CS0834: A lambda expression with a statement body cannot be converted to an expression tree

int z = 3;
Expression<…> y = () => z = z + 5;
// error CS0832: An expression tree may not contain an assignment operator.

To find out other limitations, I’ve looked in Microsoft.NET\Framework\v3.5\1033\cscompui.dll file (that contains all string resources (errors/warnings) for csc compiler) and did a search on “expression tree”.

So there is a summary table of all compiler errors on expression trees with sample code for each case:

Error code Error message Sample code
CS???? Partial methods with only a defining declaration or removed conditional methods cannot be used in expression trees. I see no way to use partial in expression tree.
Partial methods always return void, so they can be used only as a statement and not in a lambda with expression body.
CS0831 An expression tree may not contain a base access.
Expression<Func<string>> y = () => base.ToString();
CS0832 An expression tree may not contain an assignment operator. See above.
CS0834 A lambda expression with a statement body cannot be converted to an expression tree. See above.
CS0838 An expression tree may not contain a multidimensional array initializer.
Expression<Func<string[,]>> y = 
    () => new string[,] { { "A", "A"} };
CS0838 An expression tree may not contain an unsafe pointer operation. No sample, I am not very friendly with unsafe syntax.
CS1945 An expression tree may not contain an anonymous method expression.
Expression<Func<Func<string>>> y = () => delegate { return "hi"; };
// [NB] This woks just fine:
Expression<Func<Func<string>>> y = () => () => "hi";
CS1952 An expression tree lambda may not contain a method with variable arguments. This one was tricky:

public string ReturnHi(__arglist)
{
    return "hi";
}

public void Test()
{
    Expression<Func<string>> y = 
        () => this.ReturnHi(__arglist("stub1", "stub2"));
}

There are several error messages other than the first one in a table that I can not produce at all.

Error message Comment
An expression tree lambda may not contain an out or ref parameter. Lambdas indeed can not capture out and ref parameters, but the error in such case is
CS1628: Cannot use ref or out parameter ‘…’ inside an anonymous method, lambda expression, or query expression.
I could not create any kind of lambda for the delegate type with out or ref parameter, not only expression tree.
An expression tree lambda may not contain a member group. No problems creating lambdas that do any member group resolution. I have no idea how to put unresolved member group anywhere without getting it resolved.