Long Term Data Preservation


Internet Archive gets a new data center

Computerworld has the story here.

The machine fits in a 20-foot-long outdoor metal cargo container filled with 63 server clusters that offer 4.5 million gigabytes of data storage capacity and 1TB of memory.

That only makes me shiver a little bit.


More news from the home front

First I can’t believe I have neglegted to encourage you to attend the DigCCurr 2009 symposium. If you haven’t already registered, you still have time. Just about every name worth mentioning in digital curation will be there and it will be an incredible opportunity to meet some of these names face to face.

Also, the report from the second International Data curation Education Action working group workshop has just been published in D-Lib and is worth reading. I believe the third workshop will be happening in and around the DigCCurr 2009 symposium, so keep your eyes peeled for that report.


CBS on data rot

David Mattison of the Ten Thousand Year Blog posted about this CBS article on data rot. It’s another solid sign that the greater populace is running into real problems with digital longevity. Two years ago I was faced with blank stares and people asking me, “what do you mean, I won’t be able to read my files in five years?”  Now they’re talking about it on CBS news. The problem is here and real and we in the trenches are taking note.


Computer Data Storage Through the Ages — From Punch Cards to Blu-Ray

A Maximum PC article about our beloved data storage methods through time. I remember coloring on used punch cards that my parents would bring home, and I definitely remember my brother and I typing up games in BASIC and storing them on cassette tape. Ah those were the days…


New PLANETS Newsletter

Check out the new PLANETS newsletter [Leads to page where you can download the pdf, or you can download the pdf HERE.

It mostly covers the impending 2010 release of the PLANETS testbed which purports to be able to:

• analyse systematically and verify, based on evidence, different digital preservation strategies such as: characterisation of digital
objects, migration and emulation.

• run a range of digital preservation tools such as JHOVE, PS2PDF,
ImageMagik, Sanselan, MSWord migration, HTMLCleaner.

• test various combinations of preservation workflows such as
migration of DOC to PDF or PDF/A.

• test preservation strategies on different types of digital objects
such as text, image, audio and video.

• measure and compare the results against pre-defined
benchmarks.

Pretty cool. I am excited to see so many tools finally being launched after so much discussion, work, and anticipation. It looks like external institutions will be able to try it out as early as Summer 2009. Hopefully, we at UNC can get our hands on it to test drive!


Xena – Digital Preservation Software

xena.sourceforge.net

Developed by the National Archives of Australia. Has anyone seen this? Used it?

“Version 4.2.1 released, 12 January, 2009

Xena is free and open source software developed by the National Archives of Australia to aid in the long term preservation of digital records. Xena is an acronym meaning ‘Xml Electronic Normalising for Archives’.

Xena is written in Java and is therefore cross-platform, running on Linux, Windows and OS X.

Xena software aids digital preservation by performing two important tasks:

  • Detecting the file formats of digital objects
  • Converting digital objects into open formats for preservation”

I have yet to discover any software that successfully performs migration activities, and even normalization actions on ingest seem to be a little sketchy, so this excites me a lot.

If anyone has any knowledge of, experience with, or any more information about this, do please share!


Another article on KEEP

KEEP being the European emulator project,  Keeping Emulation Environments Portable

From ITexaminer.com: Attempt made to save digital information for future

Interesting: “The KEEP project goes beyond a modular emulator in that it is being called “universal” and “the world’s first general purpose” emulator. It carries a price tag of $5.2 million, contributed by the European Union.”

$5.2 million?? Yow.

I’m feeling a little frustrated though, because all I can find are news story after news story, and no solid spec info. Apparently, Dan Pinchbeck “and team” are AIMING at developing this thing, so possibly there just isn’t much out there yet and if anything, we’ll just have to ask THEM what their plans are. Or wait.

Here’s the news page from the University of Portsmouth, too, if you’re interested.



DigCCurr Professional Institute Registration NOW OPEN

I have the great opportunity to be participating in the organization of the DigCCurr Professional Institute: Curation Practices for the Digital Object Lifecycle.

Check out the institute page HERE.

The institute consists of one five-day session in June 2009 and a two-day follow-up session in January 2010. Each day of the June session will include lectures, discussion and a hands-on “lab” component. A course pack and a private, online discussion space will be provided to supplement learning and application of the material. An opening reception, break time snacks and drinks, and a dinner on Thursday will also be included.

Institute instructors are:

  • From the University of North Carolina at Chapel Hill: Carolyn Hank, Dr. Cal Lee, Dr. Richard Marciano, Dr. Helen Tibbo. Assisted by Heather Bowden.
  • Dr. Nancy McGovern, from the University of Michigan.
  • Dr. Seamus Ross, from the University of Toronto.
  • Dr. Manfred Thaler, from the University of Cologne.

Topics of the institute include:

Monday

  • Digital curation program development
  • LAB – DRAMBORA and/or PLATTER in action

Tuesday

  • Strategies for engaging data communities
  • Characterizing, analyzing and evaluating the producer information environment
  • Submission and transfer scenarios – push and pull (illustrative examples from DICE group projects)
  • Defining submission agreements and policies
  • LAB – Assessing File Format Robustness
  • Importance of infrastructure independence

Wednesday

  • Overview of the digital preservation problem
  • Managing in response to technological change
  • Characterization of digital objects
  • LAB – Creating Ingest rules in iRODS
  • From rules to trust – forms of evidence that a repository is doing the right things

Thursday

  • Access and use considerations
  • Access and user interface examples from DICE
  • How and why to conduct research on digital collection needs
  • LAB – Analyzing server logs and developing strategies based on what you find
  • Returning to first principles – core professional principles that should drive digital curation

Friday

  • Overview and characterization of existing tools
  • LAB – Evaluating set of software options to support a given digital curation workflow
  • Formulating your six-month action plan – task for each individual, with instructors available to provide guidance

As you can see, it’s a content rich institute. We will all be learning a LOT. What is different about this institute is that we will be inviting all of the participants back in January 2010, so we can all discuss how the principles we learned in the first week were effectively and/or ineffectively applied in the real world.

I admit my bias when I say I think it’s well worth participating in this if you are a practitioner in the field of digital archives, repositories, libraries, curation, etc. As this field is developing, it is very important that we are able to get together and solidify how and why we are going to tackle the challenges that we are facing.

I hope to see you there!!


More links

Video Paradiso: how an Italian town rescued a priceless film collection
This is just a neat story that shows how what a community cares about can affect the longevity of an archive.

David Rosenthal on format specifications
Good stuff that I need to read again.

Discussion summary on format specifications
from Chris Rusbridge’s  digital curation blog

Image Fortress Launches Online Archive of World’s Space Exploration Imagery
Neat. They claim to provide organizations with a “fully  automated, online digital archiving services that ensure the secure, long-term preservation and integrity of electronic documents and still and video imagery.” I am deeply curious to know how they accomplish this.

German lovers – aged six and five – try to elope to Africa
This snuck into my delicious archives. Adorable.


New love to longtermdata blog

After surviving my first semester as a PhD student, I think I just might be able to devote a little more time to the longtermdata blog. I’ve added some of my favorite blogs to the blogroll and am going through the painful process of finding a suitable theme. Bear with me on the theme thing.. it’s rough out there.

In the past month or so I have started to collect some good links and stories which are all somehow related to long term digital preservation. Here’s a good chunk of them:

A Tool to Verify Digital Records, Even as Technology Shifts
This is mostly about research happening at the University of Washington, but Stewart Brand is quoted at the end of the article about the Format Exchange, a project which I have been dabbling in.

Digital Preservation Challenge
DPE’s latest digital preservation challenge. A great great great idea. I encourage anyone to take it on.

Got Data?
If you have access to a subscription to ACM, check this out. “Tools for surviving a data deluge to ensure your data will be there when you need it.”

ExLibris Group Releases Digital Preservation System
!!

Movage
Kevin Kelly’s ‘movage’ idea infiltrates the digital curation community. Hoo-ray!

Kevin Kelly on Movage
Yes, yes, and yes.

That is probably more than enough to keep you all busy. I’ll be bringing more of the old and more new stuff to you in the very near future.