About

AssembleMe is an information science blog written by Julius Schorzman that frequently sways off-topic.

Julius is the CEO of the Google Ventures backed company DailyCred. DailyCred makes working with OAuth super duper simple.

To view some of my old projects, visit Shopobot or CodeCodex.

You can follow me on Twitter if you really want to @schorzman.

Search
Contact Me
This form does not yet contain any fields.
    « I, Robot II | Main | ISP Smackdown! »
    Tuesday
    Jul202004

    Researchers: Excel Screwed My Data.

    INFO SCIENCE: I don't know why the hell these people are using Excel to store genetic information, isn't there something better out there? (If not, then what are all those BioInformatics graduates doing with their time?) In any matter, they are using Excel, and Excel is doing what it's programmed to do: mess with your shizzle in annoying ways.



    Excel is widely used in genetic research to process microarray data. A microarray chip detects amounts of protein produced from thousands of different genes, enabling researchers to see which particular gene is being expressed in a sample of diseased tissue, for example.



    The errors are introduced because some genetic identifiers look very like dates to Excel. If the spreadsheet is not properly set up, it will convert an identifier, such as SEPT2 to a date: 2-Sep. The conversion, the researchers say, is irreversible: once the error has been introduced, the original data is gone.



    In a paper published on BioMedCentral, Zeeberg et al explain that they noticed that some identifiers were being converted to non gene names.



    The problem here is obvious (to anyone that doesn't work at Microsoft anyway). Excel should only reformat the cell's data if it suspects it's a date, not actually change the underlying data. This has always annoyed me about Excel. "1-1" can mean a number of things: 1 for 1, 1:1, 1-1=0; but to excel it's always "Jan-1-[this year]." If you go back and try to format the date back to being a number or text, you'll find it has been trashed and replaced with something like 37987 (which is the number of days from 1/1/1900 to 1/1/2004).



    Someday -- hopefully by then end of the century -- we'll all have competent office software.

    Reader Comments

    There are no comments for this journal entry. To create a new comment, use the form below.

    PostPost a New Comment

    Enter your information below to add a new comment.

    My response is on my own website »
    Author Email (optional):
    Author URL (optional):
    Post:
     
    Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>