Sunday 30 December 2007

Metadata-driven classification

Having a (very) large digital music library, and being a regular listener, classifying my tracks and albums for my needs has always been an epic quest without any perfect solution. The best thing I've found so far is foobar2000's media library and its various treeview visualization plugins, that allow complex custom sorting, filtering and display (think of it as scriptable iTunes custom playlists).

Back to the point... I recently found two related articles that develop many interesting observations and concepts around the theme of classification of digital items :
Ontology is Overrated: Categories, Links, and Tags
Hierarchy vs Facets vs Tags

Basically, they are explaining how "real world sorting" (folders and shelves) appears as powerless when you try to use it on a large and heterogeneous collection of items. Tags and facets appear as more flexible and powerful solutions (see Nobel prize winners as an example of facet-powered browsing).
That said, new problems come with facets and tags : they fail to give hierarchical information to the viewer (we all tend to organize stuff hierarchically when we're lost... how about living with a classification system with no hierarchy at all ?).
What's more, the act of tagging items independently deprives us of the mental effort we do when we're grouping and sorting stuff to put them into folders. As a result, we have a much poorer mental view of our tagged collection than we would of a classified collection, but we can access its contents much more efficiently thanks to the engine.


This post has been completed while listening to : nothing !

Sunday 9 December 2007

To all you "IT neighbours"

Don't tell me that it has never happened : a few knocks on the door, your neighbour with a sorry look on her face

"I'm really sorry to ask you that, but since you work in the IT field, I wondered if..."

and ten minutes later you're entrusted with a computer crippled by some virus that came bundled with a promising emule download.
  • First thing to do is to get half the job done by finding out the name of the virus. If it tries to open a suspicious webpage or display a specific message, do a web search and try to find a matching description, preferably on Symantec's threat database.

    NB : I have to make myself clear here : I'm not indirectly recommending the use of Symantec software; all I'm saying is that their database is often accurate and comprehensive, which helps much in eliminating viruses manually.

  • Now that you know which files are responsible for the mess, and which registry keys are to be removed, don't bother rebooting in failsafe mode five times in a row. Just disable system recovery on all drives and reboot on the Ultimate Windows Boot CD. This will load a Windows OS from a CD (on the very same principle as a Linux Live CD) and allow you to repair the resident OS without loading a byte of it. You can edit the registry, move files, run various antivirus softwares, fix low level hard drive issues (MBR rewriting, partitioning) and much more.
Give it a try if you're not convinced yet - it cannot be longer than struggling with a well-designed rootkit. You'll thank me when your neighbour comes back with a smile on her face and a box of pâtisseries :)


This post has been completed while listening to :

Ar tonelico II Hymmnos Concert Side Red
Ar tonelico II Hymmnos Concert Side Red "Flame ~ Homura" (Various artists)

Sunday 2 December 2007

Speeding up Dojo 0.9

Remember my previous article about Dojo ? What had to happen finally happened : one of our competitors released a website with a rich form powered by another Javascript framework. The catch ? Their form is faster than ours. Period. I've been told that investors are the first to complain about the loading time of our forms. The culprit was well-known : Dojo and its voracious parsing/initialization lag. My job was to speed up the beast.

Getting into the details

My first guess was that too much time was spent on two things :

  • loading the various files/packages needed to initialize all the Dojo widgets
Each class of Dojo widget requires its own package (e.g. putting a ValidationTextBox on a form requires the loading of TextBox.js and validate.js, each of them requiring a few more sripts to be loaded...)

  • parsing the DOM tree to "widgetify" the elements marked as such.
I assumed that too much CPU time was spent into the transforming of standard DOM input nodes into Dojo input nodes.

e.g.
<INPUT TYPE="text" dojoType="dijit.form.TextBox" id="firstname" />
is turned into :
<input id="firstname" class="dijitInputField dijitFormWidget" type="text" tabindex="0" maxlength="999999" size="20" name="" autocomplete="off" dojoattachevent="onfocus,onkeyup,onkeypress:_onKeyPress" dojoattachpoint="textbox,focusNode" style="" widgetid="nom" value="" valuenow="" disabled="false"/>
after parsing.

Getting asynchronous

As written in the YUI team's study, browsers can only download two elements simultaneously from the same domain. It implies that you can easily speed up downloads by creating alternate domains from where to download (hence the "transferring data from static.foobar.com" that appear sometimes in your toolbar while you're browsing foobar.com - they optimize the loading of their static content by serving it under a different domain).

Applying that rule to Dojo is a piece of cake since version 0.9, thanks to the AOL CDN. Instead of including your own, local version of Dojo, follow the guidelines and include the AOL-hosted version. Not only your scripts will benefit from the CDN and load with the same speed from any corner of the planet, but also -and more importantly-, they will be loaded from an AOL domain. As a consequence, your browser will be loading Dojo scripts at the same time as other resources from your page.

Disabling autoparsing

The thing to know is that parsing exists because Dojo doesn't know where to look for its widgets, and thus parses the whole document. In Dojo 0.9, autoparsing can be disabled by replacing djConfig="parseOnLoad:true" on the line which includes dojo.js to the document. Parsing can then be activated manually on all the children of a given DOM node :
dojo.addOnLoad(init);

function init(e)
{
   dojo.parser.parse(dojo.byId('myForm'));
}

As for the results...



Enabling parallel downloads from the CDN obviously speeds up the whole thing (6.38s to 2.57s to load the whole page).

However, as opposed to what I thought, parsing the DOM tree manually takes slightly more time than letting Dojo parse automatically while loading the document. A bunch of articles about Dojo are stil claiming that manual parsing is faster; however if you look closer, these articles are all about Dojo 0.4 (especially if they refer to "searchIds", now deprecated). Autoparsing has undergone many optimizations since then, and version 0.9 doesn't need such tricks anymore.


This post has been completed while listening to :

Don - Original Soundtrack
Don OST (Shankar Ehsaan Loy)