a better way to detect Java WebStart

So, just as I wrote that I wouldn't be posting much in future, I thought I would quickly post this little tidbit. There are various ways described online that you can use to detect if a client has Java WebStart installed. The problem is all of them require you to mix in VBScript for Internet Explorer and then use Javascript for the other browsers. There is a way you can support Internet Explorer and still use only Javascript. Check it out:

function jwsInstalled() {     
      // For Internet Explorer.
      if (navigator.userAgent.indexOf('MSIE') > -1) {
          try {
              var jws = new ActiveXObject('JavaWebStart.isInstalled');
              return true;
          }
          catch (e) {
              return false;
          }
      }

      // Firefox is happy with "x-java-jnlp-file". For Chrome and Safari
      // this does not work, instead I just check for "x-java-vm".
      // If they have a recent JVM installed, then they usually also have
      // Java WebStart installed.
      return navigator.mimeTypes &&
             navigator.mimeTypes.length &&
                 (navigator.mimeTypes['application/x-java-jnlp-file'] != null ||
                  navigator.mimeTypes['application/x-java-vm'] != null);
}

I use this to popup a little dialog and tell users to download a new JRE if Java WebStart is not installed. It works pretty well. And yes, it's 2009 and I'm still using Java WebStart ... although I do have a good reason for it. :-)

so it's called "printcasting"

I've been pretty busy lately and haven't worked on zinepal.com that much.

Last week I also found out about the Knight Foundation News Challenge. The Knight Foundation is a not-for-profit group that grants awards to new media and journalism projects through their News Challenge. Unfortunately the 2009 challenge closed on November 1st and I didn't have enough time to apply.

Last year the printcasting.com project was awarded a significant grant through this challenge. The interesting thing is that their ideas are pretty much exactly the same things I am trying to achieve with zinepal.com. Now I really wish I had heard about this challenge sooner.

But at least now I know what the whole concept behind zinepal.com is called: printcasting.

zinepal.com - create custom printable zines from any online content

zinepal.com is my latest project that I've been working on for half a year now. Zinepal enables you to easily create custom printable zines from any online content. While it is primarily intended for blog content it will work with any web page that provides substantial text or image content that can be isolated and reformatted for printing.

The Idea

There are two main ideas that motivated me to work on zinepal.com. The first is to bridge the gap between online media and the traditional paper media. Zinepal enables bloggers to easily make their content available as a printable zine. When readers print the blog zine the content is now reaching a whole new audience. Readers in coffee shops, on the bus, in the park ... all the places the Internet either doesn't go or is inconvenient to use. I mean, who wants to read the newspaper on their iPhone? This can create a viral effect as readers leave behind copies of the blog zine and new readers pick it up. For example, you may find a blog zine from local blogs in your favorite coffee shop, providing you with great alternative content that is relevant to you.

The other idea behind Zinepal is to cater to another group of Internet users. Currently you could broadly classify Internet users as content creators and content consumers. Content creators are the bloggers that regularly write on their blog. The consumers are the readers of blog content and other Internet news outlets. As a creator I may frequently write blog content that is not interesting to a larger audience. The consumers are now faced with the task of sorting through many blogs to find the small nuggets of content they are interested in. For example, some of my friends have hundreds of blogs in their feed readers and have to sort through all the uninteresting content to find the good stuff.

This is where the new group of Internet users comes in: the editors. As opposed to sites driven by popular opinion such as Digg or Reddit, the editors focus on their specific topics of interest and create zines based on this. As a reader I can then follow the zines of the editors I trust. I now have a human filter that does all the work of sorting through blogs for me. The advantage to the editors is the ability to gain recognition and readership for their custom zines, the same way good bloggers gain readership for their blog.

Technical Challenges

The biggest technical challenge for Zinepal was coming up with the technology to reliably extract and reformat content from all the different blogs and websites on the Internet. This was required due to the fact that most RSS or Atom feeds only include snippets of content in the feed. I've spent most of my time so far working on this technology and getting it to the point where it works reasonably well. It's still not perfect and I can think of a few more important enhancements to make, but at least for a start I think it is good enough.

Instead lately I have been focusing on the website part of Zinepal to enable users to actually start using the technology. So please, go ahead and visit zinepal.com to give it a try!

File Type Manager source code

Due to the renewed popularity of File Type Manager (mainly because of Windows Vista) I've decided to make the source code available. Maybe somebody else feels like working on this program some more. Keep in mind that I wrote this when in high school and just learning how to program, so it probably isn't the greatest code. Also it's written in Visual Basic 6. Ugh.

File Type Manager 2.0.1 Source Code

Note that I've licensed it under the LGPL. It includes an ActiveX control that displays the file types, so you could re-use that somewhere else if you wanted to.

Working Part-Time, Moving, Starting Web 2.0 Project

I thought I would write a quick blog entry to update everyone on the latest happenings...

First off, since November 1st I'm only working part-time at GenoLogics. I'm spending two days a week working on a personal project. I came up with (what I think is) a really good idea for a Web 2.0 project. So I've cast away the chains of J2EE and I'm working with PHP and Drupal to create a nice Web 2.0 site. Yes, it will have all that AJAX goodness that the modern geek (user) is accustomed to.

Why PHP/Drupal you ask? Well, I did look into Ruby on Rails and also some Python frameworks. The thing is that I know PHP/Drupal very well, so I can be productive very quickly. At this point I just didn't want to invest the time to learn a new framework. Also, Ruby on Rails just didn't really turn me on, although granted I spent very little time looking at it. The thing is with Drupal I get so much infrastructure that is already provided for me: security, comments, user profiles, theming, page generation, etc. I'm not sure why I would want to use Rails and roll it all for myself. Having the support of a large Drupal user community backing up your infrastructure is also a big plus.

Anyway, the next thing is that I'm moving to Vancouver on December 1st. I'm getting a little bored in Victoria and also I think that the technology scene in Vancouver will be better. I'm looking forward to check out the Drupal and PHP user groups. Finding a place to live in Vancouver was pretty tough, but in the end I found a nice 1 bedroom in Kitsilano. So I guess I'm all set. :-)

On another note: GenoLogics is hiring. You should apply. It's a good place.

Open a File in the Default Application using the Windows Command Line (without JDIC)

Quite a few people have asked me about this in the past. If you have a file how can you open it in the default associated application without querying the registry or using some other Windows API? Or if you program in Java how can you do it without using JDIC?

The easiest way to do this is using the "start" command. For example to open the file "readme.txt" in the default text editor you would do this:

C:\>start readme.txt

You can also use start to open folders or follow shortcuts:

C:\>start "My Shortcut"    <-- note that you don't need .lnk at the end

This will open the target of the "My Shortcut" shortcut. If the shortcut points to a folder it will open a Windows Explorer window for it, if the shortcut points to a document it will open it in the default application and if the shortcut is for a program it will launch the program.

The trick is that "start" isn't an executable. It is a built-in command of the Windows command line interpreter "cmd.exe". In Java (and other languages) if you try to create a process using the "start" command this will fail -- since there is no "start.exe" executable in the system.

Instread you have to invoke "start" through the "cmd.exe" interpreter. This can be done using the /C flag:

cmd /c "start readme.txt"

This can be run successfully in Java using Runtime.exec() or a ProcessBuilder. Simply calling "start" directly would fail. Note that this limitation is also true for many other Windows commands. If something fails to invoke you should always try running it using "cmd /c".

JBossMQ JMS over HTTP performance gotchas

Here's a lesson I learned recently: Don't use JMS over HTTP if you want to have anything close to high-throughput, at least not if you are using JBossMQ. This may also be the case for other providers, depending on their HTTP client implementation.

And here's why: When a JMS over HTTP client is subscribed to a JMS destination it is actually polling the server. It will connect, receive a message, close the connection, reconnect, get the next message, etc. That's because HTTP doesn't have persistent connections or server-client callbacks the way a binary protocol might have. The client needs to reconnect to the server with a new HTTP request every time it wants to check for messages.

If you are receiving a lot of messages this will result in very frequent HTTP requests [1]. This causes memory and threading problems on the server as it spins up new threads to handle the requests.

In my case this is made worse by the fact that the client will very frequently close an existing JMS consumer and create a new consumer with a different JMS selector. What happens in this case is that the new consumer will use a new outgoing port for its HTTP connection [2]. As the client rapidly creates new consumers and makes connections it will use up more and more ports. Windows is slow in cleaning up relinquished ports, so under heavy load when receiving a lot of messages and creating new consumers the client will eventually fail to connect when all ports are used up [3]. Making so many frequent HTTP connections on the client also causes memory and threading issues.

Luckily this issue is easily addressed by switching to a different JMS protocol. By using the JBossMQ UIL2 protocol only a single port and persistent connection is used for JMS. This allows the client to rapidly receive messages and create/close consumers without problems.

I thought this was an interesting problem since initially the implications of using JMS over HTTP weren't clear to me. The original idea of going over HTTP was to avoid opening an additional port on the server.

Notes:

[1] It is possible to set a property that will cause the client to wait before reconnecting to get the next message. However, this is not desirable if you want the client to process messages as quickly as possible.

[2] Using a new outgoing port may happen even if you're receiving messages using the same consumer, without closing/creating new consumers. I didn't test that case.

[3] On Windows the port limit can be increased to work around this part of the problem: KB196271

Impressions from BioIT World

I attended BioIT World last week as part of the GenoLogics crew that traveled out there. From a marketing and sales perspective it was an excellent show. Traffic at our booth was steady and it was very busy at times.

I thought the technology side of things was a little disappointing. I walked away with very little new concrete information. Most of the talks focused on what we "should" be doing, especially with regards to the semantic web and RDF technology. This is quite interesting, but there is nothing new here. I'm sure most of the audience has already heard this many times. I would have been much more interested to see some concrete examples of how this was actually implemented and put to use. I suppose the problem is that big pharma (who has the money + resources to actually do this) isn't interested in sharing their "secrets" since it is considered a competitive advantage.

Personally I'm still very skeptical around the semantic web and the feasibility of it in practice. While the technology certainly makes sense, the manual effort of unifying the many different systems and mapping them to an established common vocabulary seems almost insurmountable. This is made even more difficult by the fact that a large number of smaller to mid-side labs in academia do not have a proper data management system and are just working with Excel files stored in some sort of directory structure. Good luck indexing that and mapping the contents to an ontology.

The most interesting talks were around IT infrastructure for next generation sequencing. The talks from the Broad Institute and Harvard were great. Some take aways:

  • 1 next generation 454 sequencer generates as much data as 399 current ABI 3730s!
  • 1 gigabit networks are barely adequate for the data, 10 gigabit is the way to go
  • But in general moving that much data around the network is impractical, so they just swap disks and move them between computers. This was termed "SneakerNet". :-)
  • Even if the raw numbers add up, your infrastructure might fail due to secondary effects. For example the Broad Institute disk array was large enough for the data, but it failed since processing software kept hitting the same areas of the disk. This caused the disk to fail. They then had to switch to clustered storage.
  • This is as much a "social" problem as a technology problem. Researcher expectations have to be reset since we realistically can not keep all the data around forever and the data will not always be available in an instant.