Tuesday
Jul202004
Researchers: Excel Screwed My Data.
Tuesday, July 20, 2004 at 1:22PM
INFO SCIENCE: I don't know why the hell these people are using Excel to store genetic information, isn't there something better out there? (If not, then what are all those BioInformatics graduates doing with their time?) In any matter, they are using Excel, and Excel is doing what it's programmed to do: mess with your shizzle in annoying ways.
Excel is widely used in genetic research to process microarray data. A microarray chip detects amounts of protein produced from thousands of different genes, enabling researchers to see which particular gene is being expressed in a sample of diseased tissue, for example.
The errors are introduced because some genetic identifiers look very like dates to Excel. If the spreadsheet is not properly set up, it will convert an identifier, such as SEPT2 to a date: 2-Sep. The conversion, the researchers say, is irreversible: once the error has been introduced, the original data is gone.
In a paper published on BioMedCentral, Zeeberg et al explain that they noticed that some identifiers were being converted to non gene names.
The problem here is obvious (to anyone that doesn't work at Microsoft anyway). Excel should only reformat the cell's data if it suspects it's a date, not actually change the underlying data. This has always annoyed me about Excel. "1-1" can mean a number of things: 1 for 1, 1:1, 1-1=0; but to excel it's always "Jan-1-[this year]." If you go back and try to format the date back to being a number or text, you'll find it has been trashed and replaced with something like 37987 (which is the number of days from 1/1/1900 to 1/1/2004).
Someday -- hopefully by then end of the century -- we'll all have competent office software.
Reader Comments