If you can read this, either the style sheet didn't load or you have an older browser that doesn't support style sheets. Try clearing your browser cache and refreshing the page.

(Smithsonian Magazine)   It seems scientists need to be better about backing up their data; according to new reports, nearly 90% of data in most studies and research are lost within twenty years   (blogs.smithsonianmag.com) divider line 140
    More: Fail, Current Biology, molecular ecology, data store, reproducibility, light-years  
•       •       •

658 clicks; posted to Geek » on 29 Dec 2013 at 9:33 PM (35 weeks ago)   |  Favorite    |   share:  Share on Twitter share via Email Share on Facebook   more»



140 Comments   (+0 »)
   
View Voting Results: Smartest and Funniest

First | « | 1 | 2 | 3 | » | Last | Show all
 
2013-12-29 08:15:07 PM
Clearly science can not be trusted
 
2013-12-29 08:26:21 PM

Heliovdrake: Clearly science can not be trusted


...so climate change is a myth, and you should vote Republican.
 
NFA [TotalFark]
2013-12-29 08:50:39 PM

Heliovdrake: Clearly science can not be trusted


Well perhaps but the evidence was lost...
 
2013-12-29 09:14:40 PM
The university I work for has a 20 year research project that is ending in the engineering school.  They've been running simulations for years and I came in to back up their data before the project was shut down. Half of their data was live on a SAN and easily backed up.  Anything older than 7 years was on tape in a room filled with farking tapes.  Thousands of them.  I figured we could hire students to load the tapes, move the data over to the SAN temporarily and then back it up.  The project team leader's response?  "Oh, the tape library broke 7 years ago.  We can't read those."

So what should have been a relatively inexpensive, uncomplicated project is now a logistical nightmare.  We found a local community college with a machine that can read the tapes, but they didn't label all of the stripe sets, so some are proving next to impossible to reconstruct.  And it's such a pain in the ass that we can't even use students, we found some old neckbeard at the community college who is working after hours to reconstruct the data for $40 an hour.  He should be done sometime in February.
 
2013-12-29 09:41:39 PM
FTFA: "Some of the time, for instance, it was saved on three-and-a-half inch floppy disks, so no one could access it, because they no longer had the proper drives,"

I can confirm. I even have the 5 1/4" floppy disks holding my Master's degree data, and a bunch of later stuff on Zip Disks. Sorry, Science.

Someday the Cloud will be obsolete and you'll have to transfer all your data to ectoplasm storage via direct mental link, or some damn thing.
 
2013-12-29 09:47:32 PM
I had some data invalidating that study, but I can't seem to locate it.....
 
2013-12-29 09:48:36 PM
That's less a backup issue and more of an archiving problem.
 
2013-12-29 10:01:20 PM
Up in Canada, PM Harper is giving away the government's science historical data-base to any 'university' who wants it. The rest goes into the dumpster.

/Can't have science interfering with the profits of his best Billionaire Buddys

If no one can be bothered to create new backups(or maintain old reading hardwares) to coincide with changes in storage, well you will be burned as a Witch for denying the 'dark-ages 2' non-electric boogaloo Priests.
 
2013-12-29 10:02:32 PM
All science is not created equal. How many documents about the statistical details of which side of the bifurcated penis of the ring-tailed lizard (ronicus jermimicus) uses during mating does humanity really need? Meanwhile, I betcha a dollar anything to do with advancing aircraft technology is rigorously updated and preserved.
 
2013-12-29 10:09:57 PM
If you're surprised by this, you've probably never been involved with real science.

If you're bothered by this, may I ask: Do you have your tax-deduction-claimed receipts and work-vehicle mileage records from 1993?

/ holy crap, 1993 was 20 years ago
// may have published real-science articles ~20 years ago
 
2013-12-29 10:17:02 PM
Apparently "Sad" tag was on a medium subby couldn't access.
 
2013-12-29 10:21:45 PM

MrBallou: FTFA: "Some of the time, for instance, it was saved on three-and-a-half inch floppy disks, so no one could access it, because they no longer had the proper drives,"

I can confirm. I even have the 5 1/4" floppy disks holding my Master's degree data, and a bunch of later stuff on Zip Disks. Sorry, Science.

Someday the Cloud will be obsolete and you'll have to transfer all your data to ectoplasm storage via direct mental link, or some damn thing.


this is why every so often, you move your data to the new format.
 
2013-12-29 10:22:10 PM

SFSailor: If you're surprised by this, you've probably never been involved with real science.

If you're bothered by this, may I ask: Do you have your tax-deduction-claimed receipts and work-vehicle mileage records from 1993?

/ holy crap, 1993 was 20 years ago
// may have published real-science articles ~20 years ago


I believe the main requirement was to hold all data for at least 5 years; longer if NIH/NIMH funded but not sure beyond that.
 
2013-12-29 10:23:41 PM
This is pretty common everywhere, not just in scientific research. Toy Story 2 had a ton of production data deleted and the backups turned out to be garbage. If it wasn't for someone taking home a copy (against policy), the data would have been lost, likely delaying the release of the movie and creating a huge added expense.

It's also rumored that the reason why Microsoft rewrote a significant portion of Access is because the source code for the previous version was lost. This kind of thing happens disturbingly often even without the issue of obsolete media.
 
2013-12-29 10:48:15 PM
The bible has been backed up and archived for at least 1500 years....but it seems a bit corrupted....wait...hold on, I got another call.
 
2013-12-29 10:48:56 PM

sheep snorter: Up in Canada, PM Harper is giving away the government's science historical data-base to any 'university' who wants it. The rest goes into the dumpster.

/Can't have science interfering with the profits of his best Billionaire Buddys

If no one can be bothered to create new backups(or maintain old reading hardwares) to coincide with changes in storage, well you will be burned as a Witch for denying the 'dark-ages 2' non-electric boogaloo Priests.


I cant wait for the day when we are finally rid of that asshole. The only question is what kind of canada will be left?
 
2013-12-29 11:00:24 PM
Hell, I can't remember where I put my keys 5 minutes ago.
 
2013-12-29 11:16:49 PM

NFA: Heliovdrake: Clearly science can not be trusted
Well perhaps but the evidence was lost...


Lisa: Dad, a tornado is heading toward Springfield!
Homer: Oh Lisa, don't be silly. Springfield hasn't had a tornado in its entire recorded history.
Lisa: Yeah, but the records only go back to 1973 when the Hall of Records was mysteriously blown away.
 
2013-12-29 11:21:36 PM

MrBallou: FTFA: "Some of the time, for instance, it was saved on three-and-a-half inch floppy disks, so no one could access it, because they no longer had the proper drives,"

I can confirm. I even have the 5 1/4" floppy disks holding my Master's degree data, and a bunch of later stuff on Zip Disks. Sorry, Science.

Someday the Cloud will be obsolete and you'll have to transfer all your data to ectoplasm storage via direct mental link, or some damn thing.


I'm sure you could find a 5 1/4" floppy drive around somewhere (probably used).

As for 3.5" drive you can buy them new for less than $15:

http://www.newegg.com/Product/Product. aspx?Item=9SIA1PU0FD9579&nm_mc=A FC-C8Junction&cm_mmc=AFC-C8Junction-_-na-_-na-_-na&cm_sp=&AID=10446076 &PID=6146953&SID=ueua30q5p9dx

If the data is not worth spending $15 to retrieve, it's probably not worth saving.
 
2013-12-29 11:27:18 PM
Everything we ever published (~200x, over the lasat 35 years) came with the stipulation we save the data for 5 years.

...still have it all.  Kept transferring to new storage forms.  Funny how that mountain of floppies from the 80's now only needs a fraction of the space on a flash drive...
 
2013-12-29 11:28:54 PM
Things can be really bad for some forms of data. Not too long ago, instruments usually had thermal printers for output but most labs didn't bother trying to set up direct data collection to a computer (too much hassle). Instead, the printouts were taped into a notebook. What happens to thermal printouts over a decade or more? They stop being readable.

There are also instruments that had proprietary output formats, so even if they did go to computer and the file is still around, it can't be read.
 
2013-12-29 11:33:14 PM

gfid: As for 3.5" drive you can buy them new for less than $15:

http://www.newegg.com/Product/Product. aspx?Item=9SIA1PU0FD9579&nm_mc=A FC-C8Junction&cm_mmc=AFC-C8Junction-_-na-_-na-_-na&cm_sp=&AID=10446076 &PID=6146953&SID=ueua30q5p9dx

If the data is not worth spending $15 to retrieve, it's probably not worth saving.


It's not always that cut-and-dried.  I had a prof ask me to offload some files from some pedestrian-looking 3.5" floppies.  Turns out that Macs were using a totally different system (constant angular velocity vs constant linear velocity?) for their 3.5" floppies in the 400k/800k era (the 1984-88 era) than modern PC-compatible drives.  Like... literally, the hardware won't read the old disks. Macs started going to the latter after 1988, but many up to the mid-90s supported the old floppies.  Fortunately, a few years ago, we still had a vintage Mac available to copy the files.  Today, we'd have to troll fleaBay for one.
 
2013-12-29 11:36:34 PM

Ishkur: NFA: Heliovdrake: Clearly science can not be trusted
Well perhaps but the evidence was lost...

Lisa: Dad, a tornado is heading toward Springfield!
Homer: Oh Lisa, don't be silly. Springfield hasn't had a tornado in its entire recorded history.
Lisa: Yeah, but the records only go back to 1973 when the Hall of Records was mysteriously blown away.


ummm. . .you mean hurricane. . .

We do contract work for application to the EPA.  We have to keep that work on file for several years after the EPA approves of the chemical.  Since we do work before the EPA sees the final results, the final application could be a decade or more after we have done our work.

We have nearly forty years of data in our archives.
 
2013-12-29 11:40:01 PM

Lawnchair: It's not always that cut-and-dried.  I had a prof ask me to offload some files from some pedestrian-looking 3.5" floppies.  Turns out that Macs were using a totally different system (constant angular velocity vs constant linear velocity?) for their 3.5" floppies in the 400k/800k era (the 1984-88 era) than modern PC-compatible drives.  Like... literally, the hardware won't read the old disks. Macs started going to the latter after 1988, but many up to the mid-90s supported the old floppies.  Fortunately, a few years ago, we still had a vintage Mac available to copy the files.  Today, we'd have to troll fleaBay for one.


I'll go out on a limb and say that if you really want the data, you'll find the time to get the necessary hardware to do it.  The hardware exists, you just have to put out a small amount of effort to acquire it.  I know several people who still possess Commodore 64s.  I am certain I could find an Apple ][e if I wanted to.  All of the normal PC formats (5 1/4, 3 1/2; DD, HD) would be even easier still.
 
2013-12-29 11:54:25 PM
Well if they kept it safe someone might use it to disprove their findings.
 
2013-12-30 12:10:30 AM
I worked in the real sciences, and therefore still have all the data.

However, if I worked in an imaginary "science", like climatology, I could discard the data at my whim. In the words of Phil Jones, "we have the data protection act, which I will hide behind".

In English, Phil is saying, "so what if all the data is made up, fark you."

That's how climate "science" works.
 
2013-12-30 12:29:17 AM
i spent the last half hour rummaging through boxes looking for a scsi adapter to see if an old iomega jaz disk is still readable, so i'm getting a kick out of this...
 
2013-12-30 12:31:19 AM

SevenizGud: I worked in the real sciences, and therefore still have all the data.


Collecting your spunk in the freezer isn't "science", even if you wear a lab coat.
 
2013-12-30 12:35:09 AM
I suspect that later ages are going to look at the 20th and 21st centuries as a dark age for sciences, specifically "social" sciences.
 
2013-12-30 12:37:17 AM

SuperT: MrBallou: FTFA: "Some of the time, for instance, it was saved on three-and-a-half inch floppy disks, so no one could access it, because they no longer had the proper drives,"

I can confirm. I even have the 5 1/4" floppy disks holding my Master's degree data, and a bunch of later stuff on Zip Disks. Sorry, Science.

Someday the Cloud will be obsolete and you'll have to transfer all your data to ectoplasm storage via direct mental link, or some damn thing.

this is why every so often, you move your data to the new format.


Or, alternatively, just print the shiat out on paper and store it. Guaranteed, it will still be readable 100 years from now with no requirement to keep converting to new data formats and media.
 
2013-12-30 12:38:51 AM
I don't know about the other science folks but at least in Chemistry research at the university level lab notebooks of procedures and data are standard. Data is printed out or written in when digital. When I left with my MS I had a stack of notebooks as high as an elephants eye, literally. Yes, original digital data can be lost or become harder to access but increases in hard drive sizes, flash drives and CD-R and DVD-R have made transferring between old computers and their replacements far easier. Still though most papers should have the necessary data in the form of percent yields and descriptions of data and procedures. Student thesis also contain lots of the same data. My thesis had 100+ pages of spectra and spectrum data. At the university I went to thesis had to be printed in this manner at minimum, a 100% cotton paper copy was archived by the library, and 25% cotton copies were also made up for library circulation, department storage, and one copy for the thesis advisor.
Here are some choice bits of the paper this article was based on:
"Our reason for needing the data (a reproducibility study) was not especially compelling for authors "
"Papers were also excluded if the data were already available as a supplementary file or appendix or on another website, as curation of these data sets is no longer the responsibility of the author."
"In every case, we attempted to find e-mail addresses for the first, corresponding, and last authors of every paper. "
"We therefore also searched online for a maximum of 5 min per author for a recent or current e-mail address. "
I'll let decide on your own what to think about some of that.
 
2013-12-30 12:47:46 AM

dittybopper: SuperT: MrBallou: FTFA: "Some of the time, for instance, it was saved on three-and-a-half inch floppy disks, so no one could access it, because they no longer had the proper drives,"

I can confirm. I even have the 5 1/4" floppy disks holding my Master's degree data, and a bunch of later stuff on Zip Disks. Sorry, Science.

Someday the Cloud will be obsolete and you'll have to transfer all your data to ectoplasm storage via direct mental link, or some damn thing.

this is why every so often, you move your data to the new format.

Or, alternatively, just print the shiat out on paper and store it. Guaranteed, it will still be readable 100 years from now with no requirement to keep converting to new data formats and media.


yeah...it seems that our embrace of technology has in some ways reverted us back to just having an oral history,  Could a future historian or archaeologist ever be able to extract meaningful records from a hard drive?
 
2013-12-30 12:57:01 AM

Nuclear Monk: yeah...it seems that our embrace of technology has in some ways reverted us back to just having an oral history


Revert is a strong word. We've never really lost oral tradition. The language changes, but Caligula wanted to be a poet every bit as much as Kanye West. And like Kanye, he was really bad at it and everyone thought he was insane.
 
2013-12-30 12:58:29 AM
Preservation of computer data is hard. Frequent data format changes make maintaining data like this a very expensive endeavor. Punch cards, 5MB hard drives, 8" floppies, ZIP, JAZ, a multitude of tape backup formats. It's a nightmare.
 
2013-12-30 01:14:07 AM
That's OK, they can just make up more.
 
2013-12-30 02:28:42 AM
Lost being a euphemism for buried when it doesn't sufficiently $upport the dominant hegemony.
 
2013-12-30 02:49:54 AM

doglover: All science is not created equal. How many documents about the statistical details of which side of the bifurcated penis of the ring-tailed lizard (ronicus jermimicus) uses during mating does humanity really need? Meanwhile, I betcha a dollar anything to do with advancing aircraft technology is rigorously updated and preserved.


The thing is, though, you never quite know what's going to turn out to be important later. There have been countless studies over the years where you go in studying one thing, and then you end up collecting some interesting data on a completely different thing totally by accident. The Milgram experiment, for instance, had a precursor in Carney Landis examining facial expressions in 1924. He was looking to examine the facial expressions made when someone, oh say, killed a rat. He got pushback from his subjects at about the same ratio as the Milgram experiment would get decades later, and Landis noted the reluctance of the subjects, but that wasn't what he was aiming to study. When someone did refuse to kill the rat, he just did it himself, because he wanted the facial expressions.

Maybe your ring-tailed lizard study ends up in the lizards being slower to mate at all when the researcher was in the room. That could be of use if the lizard ends up on the Endangered Species List at some point, so they know how best to get them to repopulate.
 
2013-12-30 02:56:34 AM
Anybody want to check the NSA. As long as they're hoarding that data they could do something useful.
 
2013-12-30 03:22:23 AM

LargeCanine: I suspect that later ages are going to look at the 20th and 21st centuries as a dark age for sciences, specifically "social" sciences.


I suspect that every age is going to look back at the previous age as a dark age.
 
2013-12-30 04:25:56 AM

NFA: Heliovdrake: Clearly science can not be trusted

Well perhaps but the evidence was lost...


Which means that if you doubt it, it can never be proven either way...

flemardo: I don't know about the other science folks but at least in Chemistry research at the university level lab notebooks of procedures and data are standard. Data is printed out or written in when digital. When I left with my MS I had a stack of notebooks as high as an elephants eye, literally. Yes, original digital data can be lost or become harder to access but increases in hard drive sizes, flash drives and CD-R and DVD-R have made transferring between old computers and their replacements far easier. Still though most papers should have the necessary data in the form of percent yields and descriptions of data and procedures. Student thesis also contain lots of the same data. My thesis had 100+ pages of spectra and spectrum data. At the university I went to thesis had to be printed in this manner at minimum, a 100% cotton paper copy was archived by the library, and 25% cotton copies were also made up for library circulation, department storage, and one copy for the thesis advisor.
Here are some choice bits of the paper this article was based on:
"Our reason for needing the data (a reproducibility study) was not especially compelling for authors "
"Papers were also excluded if the data were already available as a supplementary file or appendix or on another website, as curation of these data sets is no longer the responsibility of the author."
"In every case, we attempted to find e-mail addresses for the first, corresponding, and last authors of every paper. "
"We therefore also searched online for a maximum of 5 min per author for a recent or current e-mail address. "
I'll let decide on your own what to think about some of that.


I make of it some allegory and a restating of the investigation criteria designed to make the study possible by restricting time spent.
 
2013-12-30 05:55:07 AM

dittybopper: Or, alternatively, just print the shiat out on paper and store it. Guaranteed, it will still be readable 100 years from now with no requirement to keep converting to new data formats and media.


How do you print a few TB of data?
And how would you get it back into a usable format once you've printed it?
 
2013-12-30 06:33:00 AM
I'm going through a bunch of old 3.5" floppies and copying the useful data as I read this, so I'm getting a kick. Word 2013 does a remarkably good job with WordPerfect files. Found my DOS 6.22 install disks and Microsoft Arcade, along with my TAPCI$ disk.

We still have a old PC tower with a tape drive and LS-120 drive. I need to get it fired back up to check about 10 LS-120's I have sitting around.

/Dang I feel old.
 
2013-12-30 07:32:58 AM
My first job after graduating was backing up x-ray crystallography data from PDP to 5.25, then via KERMIT to...I don't know, the other side was somebody else's problem.

refl.foa

/getting a kick, etc.
 
2013-12-30 07:49:30 AM
Back in the late 1980s or early 1990s I first heard the saying that digital lasts forever or five years, whichever comes first. This is old news to anyone who's ever done a backup.
 
2013-12-30 07:55:15 AM

dittybopper: SuperT: MrBallou: FTFA: "Some of the time, for instance, it was saved on three-and-a-half inch floppy disks, so no one could access it, because they no longer had the proper drives,"

I can confirm. I even have the 5 1/4" floppy disks holding my Master's degree data, and a bunch of later stuff on Zip Disks. Sorry, Science.

Someday the Cloud will be obsolete and you'll have to transfer all your data to ectoplasm storage via direct mental link, or some damn thing.

this is why every so often, you move your data to the new format.

Or, alternatively, just print the shiat out on paper and store it. Guaranteed, it will still be readable 100 years from now with no requirement to keep converting to new data formats and media.


Paper lasts a long time, but it consumes alot more physical space.
You start getting into the heighty data collection jobs and there's just no way to keep it all in a file cabinet.

/It ends up getting shredded, burned, or flooded out anyway... unless you've got a proper storage facility.
/To which you could have put a cloud style server in there from the start, now that we have those.
 
2013-12-30 08:08:45 AM
You don't think they actually want people to try and repeat their results, do you? C'mon, what the hell kind of science is that? You're just supposed to publish as much as possible as quickly as possible under the assumption no one will actually care to understand what you wrote and the grants will keep flowing.
 
2013-12-30 08:24:45 AM
I did tape backups in college for the math and comp sci departments. Dailies and weeklies stayed in the office and monthlies and annuals were sent to Iron Mountain. I forget what the annual retention was - no way it was 20 years though.
 
2013-12-30 08:40:50 AM

IwasKloot: I did tape backups in college for the math and comp sci departments. Dailies and weeklies stayed in the office and monthlies and annuals were sent to Iron Mountain. I forget what the annual retention was - no way it was 20 years though.


I worked in a "distance learning" production center in college. Taping and distributing graduate-level classes on U-matic cassettes.
 
2013-12-30 08:53:22 AM
Look for the paper.
 
2013-12-30 08:59:04 AM

Touched Inappropriately By The Hand Of God: dittybopper: Or, alternatively, just print the shiat out on paper and store it. Guaranteed, it will still be readable 100 years from now with no requirement to keep converting to new data formats and media.

How do you print a few TB of data?
And how would you get it back into a usable format once you've printed it?


How often do you need to store a few TB of data?  Don't forget that for something like a space mission, a large portion of the data sent back is images.  In fact, that tends to dominate the bandwidth.  When you look at it, an uncompressed image that would fit on a standard 8.5 x 11" paper at 300dpi and 16 bits per pixel would be about 128 megs.  A Terabyte is roughly about a million megabytes, so you'd need about 8,000-ish of them to constitute a terabyte.

Quality 100 lb coated gloss book paper measures .0048" by caliper, so 8,000 pages would take up (8,000 * .0048)/12 = ~ 3.2 feet of shelf space.  Let's be generous and say 4 feet, though, to account for covers.

By that standard, the space under the table I use as a night stand, where I store my current "reading books" holds about 2.5 terabytes of data.  The bookshelves downstairs in the basement hold approximately 8 terabytes.

And how is a printed format not usable?  Have you forgotten how to read?  But even barring that, they've got this new technology called "scanning" that allows you to automatically enter data into computer usable format.  Print the data off as either an image, or if the data is such that that isn't practical, as OCR characters.

You're stuck thinking "technology".  You see modern problems and you think there must be a modern solution to them.  Sometimes there isn't.

That doesn't, of course, mean that you should abandon modern technology, just be aware of it's limitations.  I write software for a living.  Been programming since the early 1980's, and professionally since the mid-1990's.  I've seen a *LOT* of changes in that time.  I have a little reminder I keep at my desk at work:  An 8" floppy disk that, according to the label, has some source code on it, and a book entitled "The Mythical Man-Month" by Fred Brooks.  Both are from 1982.

Guess which one I can still read.
 
Displayed 50 of 140 comments

First | « | 1 | 2 | 3 | » | Last | Show all

View Voting Results: Smartest and Funniest


This thread is closed to new comments.

Continue Farking
Submit a Link »






Report