Google Analyzing HTML
Thursday, January 26th, 2006Google has done a pretty interesting analysis of HTML markup of about a billion webpages. They parsed all pages and have some nice graphs available showing what are the top used elements, tags, classes and attributes (unfortunately, these graphs only show in Firefox 1.5+).
It is pretty interesting to see that almost all pages at least get the basics right: they define html, head, title and body. Most pages contain (at least one) a element, but frighteningly is that more than half use a target attribute, meaning, they open another window for you.
Natural markup using paragraphs (p) is used less often than the br tag. There are also a huge amount of pages that use table, but apparently only half of the pages that use tables put cells in it!
A lot of people use presentational attributes on their webpages. This is most obvious with the attributes for the body element. Also interesting to see is that authors don’t really care about standards, of the top twenty body attributes, nine are invalid, and five have been deprecated for over eight years.
Have a look through the graphs, they show some interesting insights in how the web is currently built up. And it provides plenty of food for thought for the new standard for HTML5.