According to HTML5 spec, every attribute that starts with data- should be ignored by browser and considered a data needed for some scripts. For example:

<ul data-sortable="yes">…</ul>

Does it remind you of something? It is a simplified version of XML namespaces, but limited to non-hierarchical data and with much more probability of data clash.

John Resig, in the comments to his blog post on the matter, says that “The learning curve and failure rate are too high to purely XML-based markup, which is why the data-* attribute exists as a means to implementing this solution”. I completely disagree with the learning curve argument, and this is why:

Programming is complex.

It is easy to forget about the complexity while doing smart new stuff. And new cool things is not what I am talking about. Inventing new thing is easy, since you do not have to think about all limitations of the new ones. Making them work for all occasions, that is what is complex.

It is easy to praise HTML5 over XHTML. It is easy to praise microformats.

But all standards are here for a reason. It is easy to punch Microsoft, however first versions of Google’s Picasa didn’t support non-English text in labels, at all. Current version of Adobe Buzzword does not support Russian text. Current versions of Picasa and Opera do not support Drag&Drop outside of application. I have never seen any of such problems in MS applications. In fact, Visual Basic supported Unicode for ages, while relatively modern web languages stumble on it.

I have already blogged about microformats being a hack. And now I read about BBC removing microformats support due to problems with screen readers and semantics of abbr tag. Is it easy to remember that screen readers must be supported? No. Is it a shiny 2.0 kind-of-thing to think about them? No. But it is one of the multiple things you have to think about when making a wide-used standard.

Adding easy solutions when possible is a great thing. But it wrong to think of programmers as people who can only work with easy solutions. Flexibility is much more important than ease of use, and that’s the lesson that produced, for example, ASP.NET MVC. Also, easy ways should exists as shortcuts for flexible ones, not be orthogonal to them.

And while HTML may be about design, and text processing, and other things, data-* is about programming.

Since my previous post on microformats, I have decided that my opinion in this matter needs more evidence.
While I could collect all following information before writing the post, I didn’t have enough motivation to do the research.
But now, after writing it, I have my self-esteem as a motivation.

Ok, so I proposed using (namespaced) custom tags instead of overloading existing ones.
Now let’s go scientific and see what questions this solution may rise.

  1. Do modern browsers support CSS styling for unknown tags in HTML documents?
  2. Can these tags be added to document without breaking standard compliance (validity)?
  3. What possible problems can arise from using non-standard tags in modern browsers?

For practical purposes, these can be converted into two main questions

  1. Should custom tags work?
  2. Do custom tags work in modern browsers?

And the answers are:

  1. By default, no.
  2. Not perfectly, but yes.

Now let’s discuss it in detail.

To understand the first answer is to understand what exactly is HTML, what is XML and what is XHTML.
The most important (maybe obvious) point is: HTML is not a subset of XML and HTML is not compatible with XML.
HTML and XML are both a subsets of SGML, and SGML does not provide a way to mix different subsets within a single document.
So custom XML tags are not allowed in a HTML document.

While there are some solutions that allow arbitrary XML to be placed in a HTML document.
For example, Microsoft has XML Data Islands.
But they can be considered grammar hacks due to XML-HTML incompatibility.

Practically, however, HTML documents have to be viewed as “tag soup” by the browsers, so custom tags do not cause document rendering to fail.

So, if I am formally out of luck with HTML, what about XHTML?
For simplicity, one can view XHTML is a rewrite of HTML to follow XML rules.
So any custom tags should be allowed in XHTML if they are properly namespaced.

But there are a lot of problems with authoring XHTML.
While some of them are more like challenges (script/style syntax), one is extremely important.
The only way to tell modern browsers that that the document is XHTML is to serve it as application/xhtml+xml
(See this document for an excellent explanation).
And Internet Explorer doesn’t support XHTML at all — so it refuses to render application/xhtml+xml.
(It doesn’t mean IE can’t open XHTML. When XHTML document is sent as text/html, IE renders it with HTML engine).
So I was out of luck once again.

At that point I understood the reasoning of microformats.
Standard compliance is an important part of better Web, and there is no completely valid way to use custom tags.

But what is with the second question? It seems that actual situation is way better than one could suppose.
Firefox, IE7 and Opera 9 all could render the custom tags style correctly in the document served as text/html.
(To be really pedantic, I set DTD and xmlns to XHTML.
After all, even if text/html documents are never parsed as XHTML, MIME Type is a server setting, not document one.)
But IE7 has a one important characteristic — it does not render custom tag styles unless there is an xmlns for their namespace on html tag.
No other tag is sufficient.

What does it mean? It means that while one can make a document that is styled correctly in these IE7,
document part containing custom tags can not be reused without providing a namespace on the aggregating document.
But it not an extremely important point, since for aggreagation one does not actually control styles as well.

So, practically speaking, one can create a document that uses custom XML tags for the cost of formal document validity.
(The document can still be made formally valid by using custom DTD, but this will put IE and FF into quirks mode).

By the way, the challenge of adding custom tags to HTML was faced by MathML (mathematical markup language) community for years.
If you are interested, you can read these discussions:

Personally, I still see microformats as a step in wrong direction.
While hCard provides HTML with a way to express the vCard semantics, I would prefer it to be just a HTML-compatible way, not the recommended one.
I see HTML as standard that needs support, but not popularized extensions.

I really like what is happening to Web.

Cool-new-ajaxy sites are often actually more friendly, useful and powerful.
Web development seem to become way less hacky.
And a lot of standards that are gaining adoption are actually extremely useful (think about RSS).

But there is a group of new standards that I fail to understand.
They are called microformats.

In my understanding, there are three pillars of Ideal Web:

  1. Markup provides semantics
  2. Styles provide presentation
  3. Scripts provide behavior

These blocks are logical, understandable, maintanable and loosely coupled.
It is worth noting that all strict DTDs are here to make the markup truly semantic and help achieve such separation.
This is why I write <strong> instead of <b>.
And this is what helps Web 2.0 applications to be really rich without being messy.

And for me, microformats are viral semantics.
They infect markup and overload it with additional meaning, turning it into an ill, bloated mess.
The microformats wiki states:

Reuse the schema (names, objects, properties, values, types, hierarchies, constraints) as much as possible from pre-existing, established, well-supported standards by reference

For me it seems more honest to say overuse, since the most interesting thing about microformats is that there are no actual problems they solve.
Consider this fragment:

<span class="tel"><span class="type">Home</span> (<span class="type">pref</span>erred):
  <span class="value">+1.415.555.1212</span>
</span>

I would prefer:

<tel><type>Home</type>(<type>pref</type>erred):
   <value>+1.415.555.1212</value>
</tel>

Now it does not seem that somebody is reusing iron to hammer nails.

It is 2007. XML is here and it is supported. X in XHTML stands for extensible.
IE did not support CSS namespaces, but you could write styles like vcard\:tel for years.
And this syntax does not seem like a show stopper to me.

Actually, upon reading on topic, I immediatelly googled for “microformats are stupid”.
The first thing I found was Why I Hate Microformats? by Robert Cooper.
He points to the same things I do, but he misses the fact that we had no need to wait for the IE7.

There is also a more interesting post Must Ignore vs. Microformats by Elliotte Rusty Harold.
The one point I do not agree is that Elliotte argues that XML does not have to be valid.
I do not see why the properly namespaced XML in XHTML would not be valid, but I will have to test it myself.

Web would be better if microformat authors read more about XHTML and did some browser tests before pushing this standard.