Some days ago I wrote an overview of typeswitch problem.
Now it’s time to give some solutions.

Overview

Let’s imagine a document inheritance hierarchy:

XsltDocument : XmlDocument : Document
TextDocument : Document

Let’s assume I want to get strings “Xml”, “Xslt”, “Not Xml and not Xslt” based on the document runtime type.
This is primitive indeed, but it does demonstrate a concept.

I call the most useful soultion fluent switch:

string result = Switch.Type(document).
        Case(
            (XsltDocument d) => "Xslt"
        ).
        Case(
            (XmlDocument d) => "Xml"
        ).
        Otherwise(
            d => "Not Xml and not Xslt"
        ).
        Result;

It does contain a lot of visual clutter, but it scales quite well in comparison to if/return approach.
The code behind is simple — each Case checks type and returns itself.
Case is a generic method whose parameters are inferred from the lambda.

For this task, case bodies do not depend on actual object contents.
So they can be expressed cleaner:

string result = Switch.Type(document).To<string>().
        Case<XsltDocument>("Xslt").
        Case<XmlDocument>("Xml").
        Otherwise("Not Xml and not Xslt").
        Result;

This time I have to specify the result type explicitly.

The fluent switch syntax is quite powerful — you can even add cases dynamically.
This is a nice difference from conventional language constructs.

Alternatives

I have tried a number of alternatives, but no one of them did better.

  1. Many overloads switch
    string result = Switch.Type(
        document, 
            (XsltDocument d) => "Xslt",
            (XmlDocument d) => "Xml",
            d => "Not Xml and not Xslt"
    );

    This syntax is quite concise and understandable.
    But it requires an additional overload for each additional case.
    So it is quite impractical.

  2. Object initializer switch
    string result = new TypeSwitch<Document, string>(document) {
        (XsltDocument d) => "Xslt",
        (XmlDocument d) => "Xml",
        d => "Not Xml and not Xslt"
    };

    Also more concise than my original solution, but much more cryptic.
    Constructor and generic parameters also add a degree of confusion.

    The most interesting thing about this syntax was that it actually worked.
    It seems object initializers have some nice fluent power.

Compilation

After running some benchmarks, I found that fluent switch is about 200 times slower than hardcoded ifs.
It may be perfectly acceptable, of course.
However, I have found a way to precompile the switch using expression trees.

From the usage perspective, precompiled switch is just a Func<T, TResult> (it does not support Actions right now).
So you can cache it in

private static readonly Func<Document, string> CompiledFluentLambdaSwitch = Switch.Type<Document>().To<string>().
    Case(
        (XsltDocument d) => "Xslt"
    ).
    Case(
        (XmlDocument d) => "Xml"
    ).
    Otherwise(
        d => "Not Xml and not Xslt"
    ).
    Compile();

which is extremely similar to the first code sample.
The differences are that you do not specify what you are switching on (it would be a function parameter).
But you do explicitly specify from/to types.

The compilation process was fun to write, since it was the first time I dug into expressions trees.
Statements are not supported in trees, so I had to use embedded ConditionalExpressions for cases.
The resulting tree is something like

d => (d is XsltDocument)
             ? ((cast => "Xslt")(d as XsltDocument)) 
             : ((d is XmlDocument) 
                   ? ((...

I have not found a way to cache cast and null-check it, so I cast/typecheck it two times.

Benchmarks

The best thing about compilation is performance:

Benchmark: 1000000 iterations, two switch calls per iteration.

Benchmark overhead:            40.1ms   30.0ms   30.0ms   30.0ms |    32.5ms
Direct cast:                   80.1ms   60.1ms   80.1ms   60.1ms |    70.1ms
Fluent switch
    on lambdas:              1512.2ms 1502.2ms 1602.3ms 1482.1ms |  1524.7ms
    on lambdas (compiled):     90.1ms  110.2ms   80.1ms   80.1ms |    90.1ms
    on constants:            1281.8ms 1271.8ms 1311.9ms 1271.8ms |  1284.3ms
    on constants (compiled):   80.1ms   90.1ms   90.1ms   70.1ms |    82.6ms
Many overloads switch:        440.6ms  390.6ms  430.6ms  420.6ms |   420.6ms
Object initializer switch:    751.1ms  681.0ms  741.1ms  751.1ms |   731.1ms

As you can see, precompiled switch is nearly as performant as hardcoded one (direct cast).
I am quite impressed by simplicity/power ratio of the expression trees.

Code

I uploaded AshMind.Constructs to Google Code.
I see it as a learning/research project, but you can put it to any practical use.

This is the first post in a two-post series on a typeswitch implementation in C#.
This one contains a problem statement and possible solutions in other languages.
The second one will contain a Switch.Type description and benchmarks.

Somewhat often I find myself writing code to do something based on runtime type of a value.
A classic case is to filter a tree with different types of nodes (expression tree, for example).

The code often looks like

if (x is A) {
    DoWithA(x as A);
}
else if (x is B) {
    DoWithB(x as B);
}
else
    // ...

Or, if you are a heavy performance freak like me it is like

A a = x as A;
if (a != null) {
    DoWithA(a);
    return;
}

B b = x as B;
if (b != null) {
    DoWithB(b);
    return;
}

// ...

After second type it really starts to smell.

I could have used a Visitor.
But I really dislike it due to the coupling between the Visitor interface and the underlying class hierarchy.
Also requires me to extend hierarchy with a zero value Accept method.

I can also use some kind of hashtable-based smart resolver, but it would be complex and slow.

Actually, that is not an obscure problem and a lot of other languages have their solutions.
There are two common ones:

  1. OO concept known as multiple dispatch.
    Multiple dispatch is just a bunch of “method overloads” resolved by runtime environment basing on the runtime argument types.
    This is quite different from ordinary method overloading — for example, in C# compiler picks an overloaded method during compilation.

    Actually, .Net has a way to do multiple dispatch through Reflection (Type.InvokeMethod), but it quite slow and not compiler-type-safe.

    There is a brilliant paper “Generalized Interfaces for Java” that gives some insight on useful multiple dispatch in Java/C#-like languages.
    Hopefully we’ll get that functionality in C# and CLR sooner or later.

  2. Functional language concept known as pattern matching.
    This is a kind of powerful switch/case statement (with a simplified syntax).
    I do not actually know much about functional languages, so that is my understanding.

The simplest possible construct (I do not want to dive into multiple dispatch) might look like this:

typeswitch (x) { 
   case (A a): DoWithA(a); break; 
   case (B b): DoWithB(b); break; 
   default: throw new ArgumentException(); 
}

And lcs Research C# Compiler has a similar syntax sample:

typeswitch (o) { 
    case Int32 (x): Console.WriteLine(x); break; 
    case Symbol (s): Symbols.Add(s); break; 
    case Segment: popSegment(); break; 
    default: throw new ArgumentException(); 
}

Cω language also had an actual typeswitch construct.
But I was not able to find out it’s syntax (my old VS.Net is somewhy ruined and web is silent on it).
Anyway, Andrey Titov (who I hope will also blog someday) reminded me that it compiled to zero IL (seems it was too experimental).

Stay tuned, next time we’ll see how it is possible to emulate typeswitch in C# 3.0.

Everything in this article is tested with Visual Studio 2008 Beta 2 and may become obsolete.

While a lot of people blog about expression trees in new C#, I haven’t seen any post about things you can not do with them. Maybe it is common knowledge, but since I stumbled in it myself, I’ll share.

Basically, C# specification says:

Not all anonymous functions can be represented as expression trees. For instance, anonymous functions with statement bodies, and anonymous functions containing assignment expressions cannot be represented. In these cases, a conversion still exists, but will fail at compile time.

So, that’s what you can not do according to the specification:

Expression<…> y = x => { DoAnything() };
// error CS0834: A lambda expression with a statement body cannot be converted to an expression tree

int z = 3;
Expression<…> y = () => z = z + 5;
// error CS0832: An expression tree may not contain an assignment operator.

To find out other limitations, I’ve looked in Microsoft.NET\Framework\v3.5\1033\cscompui.dll file (that contains all string resources (errors/warnings) for csc compiler) and did a search on “expression tree”.

So there is a summary table of all compiler errors on expression trees with sample code for each case:

Error code Error message Sample code
CS???? Partial methods with only a defining declaration or removed conditional methods cannot be used in expression trees. I see no way to use partial in expression tree.
Partial methods always return void, so they can be used only as a statement and not in a lambda with expression body.
CS0831 An expression tree may not contain a base access.
Expression<Func<string>> y = () => base.ToString();
CS0832 An expression tree may not contain an assignment operator. See above.
CS0834 A lambda expression with a statement body cannot be converted to an expression tree. See above.
CS0838 An expression tree may not contain a multidimensional array initializer.
Expression<Func<string[,]>> y = 
    () => new string[,] { { "A", "A"} };
CS0838 An expression tree may not contain an unsafe pointer operation. No sample, I am not very friendly with unsafe syntax.
CS1945 An expression tree may not contain an anonymous method expression.
Expression<Func<Func<string>>> y = () => delegate { return "hi"; };
// [NB] This woks just fine:
Expression<Func<Func<string>>> y = () => () => "hi";
CS1952 An expression tree lambda may not contain a method with variable arguments. This one was tricky:

public string ReturnHi(__arglist)
{
    return "hi";
}

public void Test()
{
    Expression<Func<string>> y = 
        () => this.ReturnHi(__arglist("stub1", "stub2"));
}

There are several error messages other than the first one in a table that I can not produce at all.

Error message Comment
An expression tree lambda may not contain an out or ref parameter. Lambdas indeed can not capture out and ref parameters, but the error in such case is
CS1628: Cannot use ref or out parameter ‘…’ inside an anonymous method, lambda expression, or query expression.
I could not create any kind of lambda for the delegate type with out or ref parameter, not only expression tree.
An expression tree lambda may not contain a member group. No problems creating lambdas that do any member group resolution. I have no idea how to put unresolved member group anywhere without getting it resolved.

Evaluating Javascript in WatiN

September 5th, 2007

The WatiN framework is quite cool, but it lacks two important things.
First one is searching by CSS selectors, or, at least, classes.
Find.ByCustom(“className”, “X”) is way too ugly. Or am I missing something?

The second (more important) one is a weak access to Javascript.
First thing I wanted to do with WatiN was to change something and then check some script state.
And getting some values from script was not obvious.

I didn’t want to use Ayende’s evil hack (no harm intentended, it gets the work done) — putting javascript state into DOM is not pretty and too string oriented.
I thought that browser COM interfaces should definitely have a way to get Javascript objects outside, that is just the way MS/COM people think.
Thanks to Jeff Brown’s comment for explaining last obstacles.

So here goes the code.
It is quite basic, but it allows you to get value of any Javascript evaluation.
As you can see, I hadn’t included any error handling, I had no time to look into it.

    public static class JS {
        public static object Eval(Document document, string code) {
            IExpando window = JS.GetWindow(document);
            PropertyInfo property = JS.GetOrCreateProperty(window, "__lastEvalResult");

            document.RunScript("window.__lastEvalResult = " + code + ";");

            return property.GetValue(window, null);
        }

        private static PropertyInfo GetOrCreateProperty(IExpando expando, string name) {
            PropertyInfo property = expando.GetProperty(name, BindingFlags.Instance);
            if (property == null)
                property = expando.AddProperty(name);

            return property;
        }

        private static IExpando GetWindow(Document document) {
            return document.HtmlDocument.parentWindow as IExpando;
        }
    }

Nitpicking:
By the way, Ayende, getting permalinks to comments in your blog is not obvious (I used View Source).

Yesterday I stumbled into a question of how to mock an internal interface.

The interface is visible in the test assembly due to InternalVisibleTo attribute, but Mockery was unable to create a mock of it.
The very first idea was additional InternalVisibleTo, but I had to find out what assembly name to use.

Since dynamic mocks are actual classes created on-demand for the specified interface, they do not belong to NMock2 assembly.
But they have their own dynamic assembly with consistent name — in case of NMock2 it is “Mocks” (I got it from Reflector).
So I used InternalVisibleTo(“Mocks”) and it actually worked.

I am not sure if it would work with strong names, but, fortunately, I do not need to strong name this project.

Recently I got an optimization problem in ASP.Net.
To be short, I had a Repeater with custom (somewhat complex) template on my Page, and I wanted to reload it asynchronously.

The first solution was XP and didin’t consider performance at all: wrap Repeater inside an UpdatePanel.
The problem was that the entire Page had to be repopulated on server just to get to the Repeater.

That gave me a choice of two headaches:

  1. Put all Page/Controls data into ViewState and bloat bandwidth.
  2. Query all additional data on the reload request and increase load on database to get data that will be thrown away.

To be honest, I could solve (2) with server-side cache, but, in my opinion, caching does not make ugly solutions any better, just faster.

So, naturally, my thought was to query the data-only WebService and then populate the Repeater on client.

And it was interesting to find out that Microsoft already has a client-side data binding solution within ASP.Net AJAX Futures.
I have found an excellent article on this matter by Xianzhong Zhu, “Unveil the Data Binding Architecture inside Microsoft ASP.NET Ajax 1.0″ (Part 1, Part 2).

I will now give a quick summary on the overall client-side binding architecture.
In essence it is quite similar to the smart DataSource controls of ASP.Net 2.0:
There is a DataSource javascript component and a ListView javascript control with html template.
ListView passes data from/to DataSource control, and DataSource talks with a JSON Web Service as a backend.
Controls and their relations are described in text/xml-script (Futures-only feature).

Everything seems quite straightforward and easy to use, I was quite happy to find it.
One thing that bothers me is the performance of text/xml-script (it is parsed on client).
But it is a concern not related to the current story.
The other question is what to do when I want to databind a complex list (consisting of several embedded server user controls) ?
I am going to find it out real soon.

Along the way, I have also noticed Sys.Preview.Data also introduces DataSets/DataTables to javascript.
That is quite funny. Personally, I never really considered DataSets acceptable anywhere above Persistence layer.
But I already thought about Persistence/DataAccess concept in javascript when I saw Gears.
And DataSets seem to fit ‘nicely’ to some GoogleGearsDataSource (it would be quite an experience to actually see one in real code).

Well, javascript O/R Mapper, anyone ?

Since my previous post on microformats, I have decided that my opinion in this matter needs more evidence.
While I could collect all following information before writing the post, I didn’t have enough motivation to do the research.
But now, after writing it, I have my self-esteem as a motivation.

Ok, so I proposed using (namespaced) custom tags instead of overloading existing ones.
Now let’s go scientific and see what questions this solution may rise.

  1. Do modern browsers support CSS styling for unknown tags in HTML documents?
  2. Can these tags be added to document without breaking standard compliance (validity)?
  3. What possible problems can arise from using non-standard tags in modern browsers?

For practical purposes, these can be converted into two main questions

  1. Should custom tags work?
  2. Do custom tags work in modern browsers?

And the answers are:

  1. By default, no.
  2. Not perfectly, but yes.

Now let’s discuss it in detail.

To understand the first answer is to understand what exactly is HTML, what is XML and what is XHTML.
The most important (maybe obvious) point is: HTML is not a subset of XML and HTML is not compatible with XML.
HTML and XML are both a subsets of SGML, and SGML does not provide a way to mix different subsets within a single document.
So custom XML tags are not allowed in a HTML document.

While there are some solutions that allow arbitrary XML to be placed in a HTML document.
For example, Microsoft has XML Data Islands.
But they can be considered grammar hacks due to XML-HTML incompatibility.

Practically, however, HTML documents have to be viewed as “tag soup” by the browsers, so custom tags do not cause document rendering to fail.

So, if I am formally out of luck with HTML, what about XHTML?
For simplicity, one can view XHTML is a rewrite of HTML to follow XML rules.
So any custom tags should be allowed in XHTML if they are properly namespaced.

But there are a lot of problems with authoring XHTML.
While some of them are more like challenges (script/style syntax), one is extremely important.
The only way to tell modern browsers that that the document is XHTML is to serve it as application/xhtml+xml
(See this document for an excellent explanation).
And Internet Explorer doesn’t support XHTML at all — so it refuses to render application/xhtml+xml.
(It doesn’t mean IE can’t open XHTML. When XHTML document is sent as text/html, IE renders it with HTML engine).
So I was out of luck once again.

At that point I understood the reasoning of microformats.
Standard compliance is an important part of better Web, and there is no completely valid way to use custom tags.

But what is with the second question? It seems that actual situation is way better than one could suppose.
Firefox, IE7 and Opera 9 all could render the custom tags style correctly in the document served as text/html.
(To be really pedantic, I set DTD and xmlns to XHTML.
After all, even if text/html documents are never parsed as XHTML, MIME Type is a server setting, not document one.)
But IE7 has a one important characteristic — it does not render custom tag styles unless there is an xmlns for their namespace on html tag.
No other tag is sufficient.

What does it mean? It means that while one can make a document that is styled correctly in these IE7,
document part containing custom tags can not be reused without providing a namespace on the aggregating document.
But it not an extremely important point, since for aggreagation one does not actually control styles as well.

So, practically speaking, one can create a document that uses custom XML tags for the cost of formal document validity.
(The document can still be made formally valid by using custom DTD, but this will put IE and FF into quirks mode).

By the way, the challenge of adding custom tags to HTML was faced by MathML (mathematical markup language) community for years.
If you are interested, you can read these discussions:

Personally, I still see microformats as a step in wrong direction.
While hCard provides HTML with a way to express the vCard semantics, I would prefer it to be just a HTML-compatible way, not the recommended one.
I see HTML as standard that needs support, but not popularized extensions.

I really like what is happening to Web.

Cool-new-ajaxy sites are often actually more friendly, useful and powerful.
Web development seem to become way less hacky.
And a lot of standards that are gaining adoption are actually extremely useful (think about RSS).

But there is a group of new standards that I fail to understand.
They are called microformats.

In my understanding, there are three pillars of Ideal Web:

  1. Markup provides semantics
  2. Styles provide presentation
  3. Scripts provide behavior

These blocks are logical, understandable, maintanable and loosely coupled.
It is worth noting that all strict DTDs are here to make the markup truly semantic and help achieve such separation.
This is why I write <strong> instead of <b>.
And this is what helps Web 2.0 applications to be really rich without being messy.

And for me, microformats are viral semantics.
They infect markup and overload it with additional meaning, turning it into an ill, bloated mess.
The microformats wiki states:

Reuse the schema (names, objects, properties, values, types, hierarchies, constraints) as much as possible from pre-existing, established, well-supported standards by reference

For me it seems more honest to say overuse, since the most interesting thing about microformats is that there are no actual problems they solve.
Consider this fragment:

<span class="tel"><span class="type">Home</span> (<span class="type">pref</span>erred):
  <span class="value">+1.415.555.1212</span>
</span>

I would prefer:

<tel><type>Home</type>(<type>pref</type>erred):
   <value>+1.415.555.1212</value>
</tel>

Now it does not seem that somebody is reusing iron to hammer nails.

It is 2007. XML is here and it is supported. X in XHTML stands for extensible.
IE did not support CSS namespaces, but you could write styles like vcard\:tel for years.
And this syntax does not seem like a show stopper to me.

Actually, upon reading on topic, I immediatelly googled for “microformats are stupid”.
The first thing I found was Why I Hate Microformats? by Robert Cooper.
He points to the same things I do, but he misses the fact that we had no need to wait for the IE7.

There is also a more interesting post Must Ignore vs. Microformats by Elliotte Rusty Harold.
The one point I do not agree is that Elliotte argues that XML does not have to be valid.
I do not see why the properly namespaced XML in XHTML would not be valid, but I will have to test it myself.

Web would be better if microformat authors read more about XHTML and did some browser tests before pushing this standard.

As a side project, I am creating a new Project Type for Visual Studio using Managed Package Framework from VS 2005 SDK.
I have read an excellent post on the matter, so mostly it was a piece of cake.
But I had one problem: Build was not available as a menu item and Build Selection was greyed out when I selected my custom project.

Adding Target Name=”Build” to the project template and specifying it in DefaultTargets did not help.

After some experiments I have found out a minimal *proj file that has Build menu item available (if you have already inherited the MPF ProjectNode).
It is quite interesting:

<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <PropertyGroup Condition=" '$(Configuration)' == 'Debug' " />
</Project>

It seems that the Configuration comparison gets parsed into the Configuration values for this project.
I am not sure whether Visual Studio or MPF does this.

Of course, if you actually want this menu item to work, you will have to add a default target, but that’s quite easy.

I wrote an article on Code Project about the way to implement generics in ITemplate.
I thought about including the whole article as a blog entry, but I am still fighting with code highlighting in blog.

Interesting fact about the Repeater solution is that you can use it to build any kind of generic asp.net control that will be perfectly specifiable in mark-up.
I am writing a GenericeControlBuilder that will not be specific to the single control type, but I didn’t have enough time to finish it yet.

The interesting question is how to implement fully generic TemplateContainerAttribute. Dynamic creation of the attribute is not a problem at all — solved in the article.
But the question is how to know right TemplateContainer type in a GenericControlBuilder that has no hardcoded information about the control it creates.
I consider adding another attribute that will specify how control type arguments map to the TemplateContainer type arguments.

public class Repeater {
    …

    [GenericTemplateContainer(typeof(RepeaterItem<>), UseParentTypeArguments=true)]
    public ITemplate ItemTemplate

    …
}

This is just to show the basic idea — this syntax is way too limiting.