Sunday 30 December 2007

Metadata-driven classification

Having a (very) large digital music library, and being a regular listener, classifying my tracks and albums for my needs has always been an epic quest without any perfect solution. The best thing I've found so far is foobar2000's media library and its various treeview visualization plugins, that allow complex custom sorting, filtering and display (think of it as scriptable iTunes custom playlists).

Back to the point... I recently found two related articles that develop many interesting observations and concepts around the theme of classification of digital items :
Ontology is Overrated: Categories, Links, and Tags
Hierarchy vs Facets vs Tags

Basically, they are explaining how "real world sorting" (folders and shelves) appears as powerless when you try to use it on a large and heterogeneous collection of items. Tags and facets appear as more flexible and powerful solutions (see Nobel prize winners as an example of facet-powered browsing).
That said, new problems come with facets and tags : they fail to give hierarchical information to the viewer (we all tend to organize stuff hierarchically when we're lost... how about living with a classification system with no hierarchy at all ?).
What's more, the act of tagging items independently deprives us of the mental effort we do when we're grouping and sorting stuff to put them into folders. As a result, we have a much poorer mental view of our tagged collection than we would of a classified collection, but we can access its contents much more efficiently thanks to the engine.


This post has been completed while listening to : nothing !

Sunday 9 December 2007

To all you "IT neighbours"

Don't tell me that it has never happened : a few knocks on the door, your neighbour with a sorry look on her face

"I'm really sorry to ask you that, but since you work in the IT field, I wondered if..."

and ten minutes later you're entrusted with a computer crippled by some virus that came bundled with a promising emule download.
  • First thing to do is to get half the job done by finding out the name of the virus. If it tries to open a suspicious webpage or display a specific message, do a web search and try to find a matching description, preferably on Symantec's threat database.

    NB : I have to make myself clear here : I'm not indirectly recommending the use of Symantec software; all I'm saying is that their database is often accurate and comprehensive, which helps much in eliminating viruses manually.

  • Now that you know which files are responsible for the mess, and which registry keys are to be removed, don't bother rebooting in failsafe mode five times in a row. Just disable system recovery on all drives and reboot on the Ultimate Windows Boot CD. This will load a Windows OS from a CD (on the very same principle as a Linux Live CD) and allow you to repair the resident OS without loading a byte of it. You can edit the registry, move files, run various antivirus softwares, fix low level hard drive issues (MBR rewriting, partitioning) and much more.
Give it a try if you're not convinced yet - it cannot be longer than struggling with a well-designed rootkit. You'll thank me when your neighbour comes back with a smile on her face and a box of pâtisseries :)


This post has been completed while listening to :

Ar tonelico II Hymmnos Concert Side Red
Ar tonelico II Hymmnos Concert Side Red "Flame ~ Homura" (Various artists)

Sunday 2 December 2007

Speeding up Dojo 0.9

Remember my previous article about Dojo ? What had to happen finally happened : one of our competitors released a website with a rich form powered by another Javascript framework. The catch ? Their form is faster than ours. Period. I've been told that investors are the first to complain about the loading time of our forms. The culprit was well-known : Dojo and its voracious parsing/initialization lag. My job was to speed up the beast.

Getting into the details

My first guess was that too much time was spent on two things :

  • loading the various files/packages needed to initialize all the Dojo widgets
Each class of Dojo widget requires its own package (e.g. putting a ValidationTextBox on a form requires the loading of TextBox.js and validate.js, each of them requiring a few more sripts to be loaded...)

  • parsing the DOM tree to "widgetify" the elements marked as such.
I assumed that too much CPU time was spent into the transforming of standard DOM input nodes into Dojo input nodes.

e.g.
<INPUT TYPE="text" dojoType="dijit.form.TextBox" id="firstname" />
is turned into :
<input id="firstname" class="dijitInputField dijitFormWidget" type="text" tabindex="0" maxlength="999999" size="20" name="" autocomplete="off" dojoattachevent="onfocus,onkeyup,onkeypress:_onKeyPress" dojoattachpoint="textbox,focusNode" style="" widgetid="nom" value="" valuenow="" disabled="false"/>
after parsing.

Getting asynchronous

As written in the YUI team's study, browsers can only download two elements simultaneously from the same domain. It implies that you can easily speed up downloads by creating alternate domains from where to download (hence the "transferring data from static.foobar.com" that appear sometimes in your toolbar while you're browsing foobar.com - they optimize the loading of their static content by serving it under a different domain).

Applying that rule to Dojo is a piece of cake since version 0.9, thanks to the AOL CDN. Instead of including your own, local version of Dojo, follow the guidelines and include the AOL-hosted version. Not only your scripts will benefit from the CDN and load with the same speed from any corner of the planet, but also -and more importantly-, they will be loaded from an AOL domain. As a consequence, your browser will be loading Dojo scripts at the same time as other resources from your page.

Disabling autoparsing

The thing to know is that parsing exists because Dojo doesn't know where to look for its widgets, and thus parses the whole document. In Dojo 0.9, autoparsing can be disabled by replacing djConfig="parseOnLoad:true" on the line which includes dojo.js to the document. Parsing can then be activated manually on all the children of a given DOM node :
dojo.addOnLoad(init);

function init(e)
{
   dojo.parser.parse(dojo.byId('myForm'));
}

As for the results...



Enabling parallel downloads from the CDN obviously speeds up the whole thing (6.38s to 2.57s to load the whole page).

However, as opposed to what I thought, parsing the DOM tree manually takes slightly more time than letting Dojo parse automatically while loading the document. A bunch of articles about Dojo are stil claiming that manual parsing is faster; however if you look closer, these articles are all about Dojo 0.4 (especially if they refer to "searchIds", now deprecated). Autoparsing has undergone many optimizations since then, and version 0.9 doesn't need such tricks anymore.


This post has been completed while listening to :

Don - Original Soundtrack
Don OST (Shankar Ehsaan Loy)

Wednesday 17 October 2007

Why Dojo ?

In the beginning was the demo... and as all demos fund-raising demos, it had to be sexy, get straight to the point and be quick to build.
As I needed "Web 2.0" javascript controls to put on these pages and impress the investors, I chose the Dojo framework.
Why Dojo ? Well, just because it was on the online press at the right moment, and I have to say their demo page was sexy and efficient at that time.

That was last year.

Now guess what ? I'm still sticking with the same javascript framework, which I recently migrated to its newest version.
Why Dojo, once again ? After the demo (which actually came along with a speech and a business plan), things went pretty fast and I found myself dealing with the real thing, with no extra time to spend on a javascript benchmark.
That said, I didn't even think of putting my choice into question - it was there and running, and everyone seemed pretty satisfied with it. Especially now that I've put more time into understanding the 0.9 architecture, I'm quite satisfied by my early choice.

So what's the point of this post ? When Dojo released their new online documentation - The Dojo Book - in September, I stumbled upon why Dojo ? and realized that the "race for production" I had been thrown into made me forget that there was something else than Dojo out there : not only the six major toolkits mentioned on the Dojo book, but also a couple others.
Seeing all these projects made me want to try more stuff, just to see how they would fit into our existing sites, if we wanted to replace Dojo by something else.

Anyway, since my boss would probably choke if I took our precious time to change something that already works, I'll just take a couple of hours here and there to see what each of them can do. Better to be aware than blinding oneself just because choices have already been made - at least that's true in the context of IT :)


This post has been completed while listening to :

Just One of Those Things
Just One of Those Things (Lionel Hampton & Oscar Peterson)

Sunday 7 October 2007

Optimizing Javascript - Load and wait

Load everything from the start = bad

I've been optimizing most of our AJAX apps lately, and found out (thanks to YSlow) that one of the many causes of the sluggish load time of these apps is the fact that every JS file that might be needed during user session is loaded right at the start of the app.

Obviously, I didn't need YSlow to realize that it was a very bad thing. As an OO programmer, I have learned to instanciate an object only and only when I'm going to use it. So, in a webapp context, why loading every possible javascript file when I just need a bunch of them ? The thing I needed was dynamic JS loading.

On demand Javascript

The next question was : "technically, how do I tell the browser to load a JS file after the page has been loaded ?". I found the solution at Ajaxpatterns.org, in this article about on-demand JS. It shows that basically, loading a file after page load is no more than adding a new DOM node to the document tree.

That said, loading a new JS into memory is one thing, but the article underlines another related problem without giving directly the solution : how do I detect that the script has been fully loaded and is ready to use ?.

In other words, you'll quickly find that it's impossible to write :
dynamicLoad("showStuff.js");
showStuff("test");
because when you call showStuff(), the dynamically included script is still being loaded, and is not available to use yet !

Load and wait

My first attempt at implementing the "load and wait" functionality was to write an additional function that would call the dynamic loading function and use setTimeout() to check the availability of a given function a while later.

The result can be found here : dynamic_load.js.
Here's a sample call :
js_include_once_wait( { func:"displayStuff", files:[ "js/displayLib.js", "js/utils.js"] } );
The call uses an object as a parameter. Here are its fields and their meaning :
  • func : name of the function to call where everything is loaded
  • funcTest (optional) : name of the function whose availability proves that everything is loaded. If not defined, funcTest gets the same values as func
  • files: array of filenames to load. They will be loaded in the same order as they are declared (I'm using that to manage dependencies). The last file should contain funcTest.

However, the heavy use of setTimeout looks somehow ugly to me. I've just read a piece of code from PHPied that uses JS events but seems to have an issue with Safari. I'll post again when I have made a progress on this...


This post has been completed while listening to :

Voices
Voices (Vangelis)

Sunday 23 September 2007

SSL Woes

I recently had to add an SSL certificate to an Apache web server (v2.2.4) that already had one certificate for a site running on port 443. That Apache server runs a couple of websites, each having its own domain name. Apache handles that situation thanks to Name Based Virtual Hosts, a directive that allows it to run multiple domains on the same IP:port (obviously, port 80 for non-secured websites).
Toggle code [httpd.conf]
It would have been natural to think that it would work the same with SSL. Not at all, actually.... after a few unsuccessful attemps, I realized that it is not possible to secure two domains hosted on the same server, using the same IP and the same port. The reason is simple : by design, SSL authentication is done before Apache can check the host name.

As a result, when you ask for a secured site on one IP:port, you always get served the first matching virtual host of the Apache config, no matter what domain you ask for.

I thought of four ways to solve that issue :

  • Run all secured areas under the same certificate

e.g. secure.mycompany.com serves secured content for brandone.com and brandtwo.com.
Toggle code [httpd.conf]
While that option would be convenient for non-commercial sites or intranet tools, it was out of the question, as brand names are very important to us.
Having clients redirected to a different domain when they have to pay online is not a smart idea, especially when they haven't noticed that mycompany owns brandone and brandtwo !

  • Use another port for the 2nd secured site

Toggle code [httpd.conf]
That's what I did as a quick workaround : the 2nd site ran on port 444 for several weeks, but we knew some heavily firewalled people (don't laugh; some of them happen to have influence :/ ) couldn't reach the secured parts of the site.

  • Use an HTTP-aware router

As I know our host runs an advanced firewall, I phoned him and basically asked him if he could do host-based dispatching before reaching Apache, by proxying HTTP requests like this :

brandone.com:443 ->firewall-> apache:85
brandtwo.com:443 ->firewall-> apache:86

The answer was that it was possible... however it would require a complex set up, and it would be impossible to reproduce the whole thing in our test and pre-production environments, which left me with a bad feeling.

  • Use another IP for the 2nd secured site

Toggle code [httpd.conf]
Some of you might have already guessed : the smartest solution is to associate as many IPs to the web server as there are domains to secure. Hopefully, our host had extra IPs to give sell us :)


modSSL FAQ
Apache v2.2 - Name virtual hosts


This post has been completed while listening to :

http://ff7.ocremix.org - Voices of the Lifestream
http://ff7.ocremix.org - Voices of the Lifestream (OCRemix community)