Skip navigation.

miscoded

the web is a hack

STICKY POST

Introduction

My journal is Opera-related and technical. It will cover the main obstacles we come across when we use Opera on the Web as it is - the standard violations, the browser incompatibilities, the sniffers and faulty scripts. That is the whole mess a poor browser has to make sense of and believe me, Opera is doing a brilliant job.

HTML5 and invalid documents - the great misunderstanding

,

People keep complaining about HTML5's error handling. It looks like a lot of people believe that because the standard includes error handling, all content will be considered valid.

This statement is wrong and repeating it doesn't make it true. Yet even Sir Tim Berners-Lee himself seems to express a concern that HTML5 represents
changes of philosophy about improving the web as opposed to letting it fester while describing it.


This is probably the greatest misunderstanding about HTML5. Let's get this straight..
  • Understanding error handling is an absolute requirement for improving HTML and the Web while being compatible with current content.
  • Invalid documents are still invalid.
  • HTML5 browsers will not "gloss over" invalidity any more than the current HTML4 browsers already do.

On the contrary, I believe that the level of detail in HTML5's error handling will make browsers and validators report more useful error messages. This will make it easier to write valid HTML.

Look at the spec. Right now I find 178 instances of the expression "parse error" in the spec text. These parse errors are validity errors that validators will and browsers may report to the user. (The spec can't dictate browsers to do so because it's a UI decision how to do it, but I'm fairly sure that Firefox, Safari and Opera will all use their existing error consoles / web developer tools to show HTML5 parse errors. After all, these errors should be so useful it would be a competitive drawback for a developer tool not to show them).

Having web browsers and validators report the same errors will help authors understand HTML and well-formedness. Today, authors who try to use the validator are baffled when the validator says a document has lots of problems, yet it works fine in browsers and they don't complain about errors. This confuses authors and makes them distrust or ignore the validator warnings.

Tomorrow, HTML5-compliant validators and browsers will report the same errors, and HTML authors will be less confused and more enlightened as a result. Hence, specifying error handling with the detail the HTML5 spec is doing should in fact contribute to improving the quality of the markup out there on the web.

How Google Docs prints

Web browsers haven't focused much on printing. The web is so much nicer on the screen than on a flake of dead trees..

Hence, web browsers are not very good at printing. For example, they have the annoying habit of splattering URLs and dates across the footer of a page. (Some versions of Opera are known to be so insistent on including a URL that they grab a random URL from a recently seen page and add it to the footer even when you print an E-mail. There is not unlikely a story about somebody becoming really, really embarassed by that bug on some blog somewhere in the universe..)

So what do you do if you write an online word processor and want your users to be able to print beautifully? Here is what Google Docs does when you click their "Print" button:

  1. Saves your document to the server
  2. Converts it on the server - on the fly - from HTML to PDF
  3. Creates a hidden Adobe Acrobat plugin instance inside the editor tab
  4. Load the newly converted PDF into the plugin
  5. Triggers the Acrobat print dialog

Wow. An impressive hack.

"Oh what a tangled web we weave, When first we practice workarounds"..

Unfortunately opera: is the new chrome:

Opera 9.62 is out. Please make SURE you upgrade as soon as possible, as we've just fixed one of the worst security issues I can remember having seen in Opera.

A while ago security researchers were forcing Mozilla to play catch-up, while they were figuring out several ways web content could inject JavaScript in the chrome: context, meaning it would run with the privilege of the Firefox User Interface. At the time it seemed much safer to be Opera which does not have a JS/XUL-based UI.

Not so fast.. Some of Opera's features have now gravitated towards HTML+JS-based screens in pages shown with the opera: protocol. The most powerful one is opera:config, and since all opera: pages can interact, a minor XSS exploit in opera:historysearch became an extremely bad security problem.

So, opera: is the new chrome: and we have to deal with that and lock any opera: resource down accordingly. :frown:

..and the winners are..

Regarding the competition in the last post, I decided to refactor it into two categories: the shortest and the best.

The shortest valid entry was as far as I can see qwo's alert(frames);. While the toString() output usually just exposes the browser's internal classes without showing real compatibility issues, this case is actually interesting because it gives away that in some browsers, "window.frames" is just an alias for "window". Opera doesn't do this, and guess what - it does cause us real compatibility problems and we have a bug on fixing it!

BTW qwo, it could be even shorter without the semicolon :smile: .

The entry that packed the most "real" quirks into the least code was arantius' setTimeout(alert,0). I liked the minimalness of just passing the built-in alert method directly to setTimeout() - though perhaps it would expose the same quirks and be even shorter without the '0' argument?

Congratulations, and thanks to everyone who contributed. And before I forget: the winners need to PM me their snail mail address if they want to receive the price :smile:.

shortest incompatible script challenge

, ,

We all know that the interoperability situation on the Web has been abysmal. Web developers everywhere are vocally voicing their complaints of wasted time and money due to browser differences. Standards bodies that are meant to solve the interoperability problem become battle grounds of special interests, or create ivoryish spec-monsters that end up fragmenting the web even more. The dazzling bells and whistles of plugins threaten open standards..

So, let's have some incompatibility fun.

Here's a challenge for JavaScript-skilled readers: Who can come up with the shortest possible JavaScript that produces 4 different results in the top 4 engines?

(For the purpose of this exercise let's define the top 4 browsers as IE, Firefox, Safari/WebKit and Opera - latest available final versions).

The winner gets fame and fortune. (Well, I think I can afford a reward of ISK1000 these days if anyone is interested - certainly our friends in Iceland really need someone wanting to buy their money.. p: So the grand price is a souvenir 1000 krónar bill from Jon's country of origin.)

Ladies and gentlemen, post your answers in the comments.

My O statuses

Some of my past status messages on My Opera - just because I don't like old messages disappearing when replaced, and thought some of them were good enough to keep somewhere.. This blog post is "somewhere". Reverse chronological order.

  • Rich text is just plain text with more money
  • Those who do not invent wheels are stuck re-inventing them
  • work·ing group [wur-king groop] -noun. 1. The intersection of web technology and religion
  • Never attribute to stupidity that which can be adequately explained by deadlines.
  • It can be much harder to figure out why something works than why something is broken.

Feel free to improve on or re-use them :smile:

How to cook tag soup with XSLT

,

Working for Opera Software's QA department gives you in-depth perspectives on the web's wild and varied coding practises. I still wasn't prepared for the curious solutions that power the menu on the new Israeli rail website.

The XSLT markup/programming language is widely used to transform one sort of DOM into another - for example turning the DOM of a generic XML file into valid XHTML. Much of the benefit is that you're working on DOM trees - making it hard or impossible to create syntactically invalid pages.

Diving into the source code shows that the JavaScript coders working on the Rail site were asleep during their education's "what's the point of XSLT" lesson. The coding is unbelievable. It's more like an XML parser/serializer stress test than a production site. Now, I don't really know XSLT and trying to debug this confirms my impression that it must be one of the worse programming languages mankind has invented - but the point of this script is to generate HTML with XSLT *string concatenation*?!?? Look at this:

<xsl:value-of select="$attribute-name"/>="<xsl:call-template name="inner-attribute-text-value"><xsl:with-param name="attribute-value" select="$attribute-value"/></xsl:call-template>"

or

<xsl:template name="inner-text-tag-open"><xsl:text disable-output-escaping="yes"><</xsl:text></xsl:template>
<xsl:template name="inner-text-element-close">
<xsl:param name="element-name"/><xsl:call-template name="inner-text-tag-open"/>/<xsl:value-of select="$element-name"/><xsl:call-template name="inner-text-tag-close"/></xsl:template>
<xsl:template name="inner-text-tag-close"><xsl:text disable-output-escaping="yes">></xsl:text></xsl:template>


Yes, all that to create a text node containing e.g.
</div>
in a DOM they will serialize only to parse it again by setting innerHTML on some poor element..

When they in their wisdom chose to generate markup inside text nodes with their XSLT they run into the familiar problem: when is < going to start a tag and when is it going to live in a text node? Hence, < is sometimes escaped as an 'lt' entity to create proper text nodes with HTML source-as-text in them (see for example the instance of
&lt;
in the code above). Now, of course when they set innerHTML they do not want this entity to appear as a literal < so they do some pre-processing: all entities they want to change into proper < and > before setting innerHTML have a comment node next to them:

<!--nwlt-->&lt;TR class="nw-2r"&gt;<!--nwgt--><!--nwlt-->&lt;TD class="nw-2c"&gt;<!--nwgt-->


and their pre-processing is a simple string replace:

sHtml = sHtml.replace(/\<!--nwlt--\>&lt;/g,"<").replace(/&gt;\<!--nwgt--\>/g,">").replace(/\<[\/]?tbody\>/gi,"");


(Why they hate the poor TBODY so much they must strip it from the markup even though the browser will re-generate them in the DOM as soon as innerHTML is parsed I can't even begin to imagine.)

If you thought XML-based toolchains and processes were going to make the Web a saner place, think again. We have now seen that in the right hands, XSLT is just another recipe for tag soup.