Long Term Data Preservation


New issue of International Journal of Digital Curation

IJDC Vol 4, No 2 (2009)

Editorial by Chris Rusbridge, 8 research papers, and 6 general articles. The peer reviewed articles are from the 4th International Digital Curation Conference, held in December 2008.

To quote Rusbridge, “A varied and fascinating collection, indeed!”

By the way, the 5th IDCC is coming up! December 2-4 at the Millennium  Gloucester Hotel in Kensington, London.


Researchers create nanotube memory that can store data for a billion years

This is some of the best news I’ve had all day.

See brief article at engadget.


DRM and digital preservation

We’ve long known the dangers that DRM pose to the long term access to digital content. Anything which restricts access to content presents a threat to accessing it later down the line.  This article posted on Slashot is a great illustration of the problem.


WARC file format becomes an ISO standard

WARC, an extension of the ARC file format, used for archiving web material, has been made an ISO standard.

WARC format offers new possibilities, notably the recording of HTTP request headers, the recording of arbitrary metadata, the allocation of an identifier for every contained file, the management of duplicates and of migrated records, and the segmentation of the records. WARC files are intended to store every type of digital content, either retrieved by HTTP or another protocol.

Standardization offers a guarantee of durability and evolution for the
WARC format. It will help web archiving entering into the mainstream
activities of heritage institutions and other branches, by fostering the
development of new tools and ensuring the interoperability of
collections. Several applications are already WARC compliant, such as
the Heritrix [http://crawler.archive.org/ ] crawler for harvesting, the
WARC tools [http://code.google.com/p/warc-tools/ ] for data management and exchange, the Wayback Machine
[http://archive-access.sourceforge.net/projects/wayback/ ], NutchWAX
[http://archive-access.sourceforge.net/projects/nutch/ ] and other
search tools [http://code.google.com/p/search-tools/ ] for access. The
international recognition of the WARC format and its applicability to
every kind of digital object will provide strong incentives to use it
within and beyond the web archiving community.

- Abby Grotke, IIPC Communications Officer, Library of Congress

See the IIPC press release.


Digital Preservation and Nuclear Disaster: An Animation

I have posted this fabulous, fabulous video everywhere but here. It is things like this that make me proud to be in this field. Enjoy!


And it gets even better..

Synthetic Gene Networks that Count. From Science Magazine.

Synthetic gene networks can be constructed to emulate digital circuits and devices, giving one the ability to program and design cells with some of the principles of modern computing, such as counting. A cellular counter would enable complex synthetic programming and a variety of biotechnology applications. Here, we report two complementary synthetic genetic counters in Escherichia coli that can count up to three induction events: the first, a riboregulated transcriptional cascade, and the second, a recombinase-based cascade of memory units. These modular devices permit counting of varied user-defined inputs over a range of frequencies and can be expanded to count higher numbers.

Another inspired piece brought to you by Life Inspired.


Interactions between zebrafish pigment cells responsible for the generation of Turing patterns

Have I also mentioned that I am interested in ways that information is formed and displayed in nature? This is a great article about zebra fish patterns and their relation to Turning patterns. [From the National Academy of Sciences of the United States of America (PNAS)]

This is yet another great article I found via the Life Inspired blog.


One if by land, two if by sea

As part of my theorizing about ways to preserve information during a so called “Digital Dark Age,” I have spent a fair amount of time thinking about/looking for ways to imprint or encode messages without using electricity. You know, besides writing on paper, carving messages in stone, or micro-etching in metals. Realistically, we can use any type of medium to arrange patterns which communicate messages to each other and use any number of methods to contain and and release these messages. We are so wrapped up these days in sending all of our messages through electronically operated conduits (for good reasons – they’re really effective) that novel non-electrical means of communicating can take us by surprise.

David Walt of Tufts and George Whitesides of Harvard have developed self-powered “infofuses” which use chemical reactions to communicate.  From what I can tell, you still need electricity to encode the fuses (they are currently using micropippeters and ink jet printers to encode them; and you need some kind of device to read the message. I still have a lot of questions about the process and its effectiveness in the field, but it’s an interesting approach at the very least.


Washington gets on the digital science data wagon

Read the Government Computer News article.

“Our nation’s continuing leadership in science relies increasingly on effective and reliable access to digital scientific data,” John H. Marburger III, director of the president’s Office of Science and Technology Policy (OSTP), said in releasing the report. “Researchers and students who can find and re-use digital data are able to apply them in innovative ways and novel combinations for discovery and understanding.”

And check out the Whie House report, “Harnessing the power of digital data for science and society.” [pdf]

Thanks Mike Brown!


UNC resident Nobel Laureate talks on media storage problems

This Daily Tar Heel article spotlights a lecture given by Nobel Laureate, Oliver Smithies, professor in the department of pathology and laboratory medicine at UNC. Smithies won the Nobel for his work with gene targeting in 2007. While most of the lecture attendees were expecting a talk on this work, Smithies surprised them apparently with a sobering discussion of the problems with outdated storage media.

..he brought a collection of his old notebooks to the event, along with a floppy disk, CD and other storing devices.

“They’re all going to be out of date,” Smithies said regarding the CD and the floppy disk.

“We have to get that information down somewhere. That’s the problem with information science that you have to think about.”

We still have so much work to do in order to preserve access to data on an institutional level, but I am wondering what, besides nagging people to keep their data moving, can we do to ameliorate the problem for the individual? And I just get the chills anytime someone advises the greater population that the best way to preserve digital information is to “print it out.” Chills.