Impressions from BioIT World
I attended BioIT World last week as part of the GenoLogics crew that traveled out there. From a marketing and sales perspective it was an excellent show. Traffic at our booth was steady and it was very busy at times.
I thought the technology side of things was a little disappointing. I walked away with very little new concrete information. Most of the talks focused on what we "should" be doing, especially with regards to the semantic web and RDF technology. This is quite interesting, but there is nothing new here. I'm sure most of the audience has already heard this many times. I would have been much more interested to see some concrete examples of how this was actually implemented and put to use. I suppose the problem is that big pharma (who has the money + resources to actually do this) isn't interested in sharing their "secrets" since it is considered a competitive advantage.
Personally I'm still very skeptical around the semantic web and the feasibility of it in practice. While the technology certainly makes sense, the manual effort of unifying the many different systems and mapping them to an established common vocabulary seems almost insurmountable. This is made even more difficult by the fact that a large number of smaller to mid-side labs in academia do not have a proper data management system and are just working with Excel files stored in some sort of directory structure. Good luck indexing that and mapping the contents to an ontology.
The most interesting talks were around IT infrastructure for next generation sequencing. The talks from the Broad Institute and Harvard were great. Some take aways:
- 1 next generation 454 sequencer generates as much data as 399 current ABI 3730s!
- 1 gigabit networks are barely adequate for the data, 10 gigabit is the way to go
- But in general moving that much data around the network is impractical, so they just swap disks and move them between computers. This was termed "SneakerNet". :-)
- Even if the raw numbers add up, your infrastructure might fail due to secondary effects. For example the Broad Institute disk array was large enough for the data, but it failed since processing software kept hitting the same areas of the disk. This caused the disk to fail. They then had to switch to clustered storage.
- This is as much a "social" problem as a technology problem. Researcher expectations have to be reset since we realistically can not keep all the data around forever and the data will not always be available in an instant.