What We Need for Durable Digital Archives

We have stone tablets dating back at least 4000 years. We still have some of Gutenberg’s bibles. I’m pretty sure that any pen marks I put to paper will survive at least my lifetime with anything other than gross negligence. But I have no idea if any of the digital formats I’m currently employing to write and store these words will even be around a few years from now.

Every pre-digital medium was inherently archival. You never had to think about the words you recorded being preserved; they just were. Preservation was a fundamental requirement of recording knowledge. If it didn’t last, what was the point?

My recent post on the disappearing digital archive enumerated the ephemeral nature of archiving writing in the digital age.1 In this post, I want to explore the conditions of an archival system that would actually work, that is, a system that would work silently and consistently with little to no conscious intervention on the part of its user.

Archival by Default

This has already been implied, but an archival system worth its salt must take conscious effort to circumvent, much like ripping out an aborted draft from a typewriter, crumpling the paper, and heaving it into the fireplace. Until that moment, the sheet of paper was, depending on its quality, archival-quality material that could last at least a century.

So, whatever we type, on whatever device we type it on, in whichever application we employ, archival needs to happen without thinking. This will of course have many implications, but let’s not get ahead of ourselves here. I’m trying to keep things as simple as possible.

Networked

Our unthinking, archival-by-default system (let’s call it UAD) will almost definitely need to talk to some form of networked “cloud”-based service in order to be accessible on the proliferating devices we use to record our thoughts. A common workflow for me is to jot down a quick thought on my iPhone, which I’ll later pull up on my Mac or iPad for more in-depth writing with a better keyboard.2 Apple’s iCloud service makes all of this a seamless experience, but it also locks me into their platform and threatens the longevity and durability of my archive (more on that later).

Privacy

If everything we write is archived by default, we would want those archives to stay private until we explicitly choose to release or publish something for public consumption. But if UAD is networked, it becomes a ripe target for invasion by people and entities of many motivations, be they political (NSA), financial (thieves), or personal (anyone who has ever had a grudge against someone else, aka everyone).

While there are ways to be secure in a networked environment, they aren’t generally known or employed by everyday people whose eyes glaze over when you start talking about public keys, or PGP, or the merits of different hashing algorithms. If archival is to happen by default, without thinking, then good security needs to also be the unthinking default that it would take conscious work to circumvent.

Open Source

As I mentioned earlier, I already have a pretty good networked syncing service in iCloud, at least in terms of the user experience: everything is mostly seamlessly synced to all of my devices with little to no thought on my part. I do have to trust that Apple keeps my data private from any prying entity, which I’m not that confident of (especially should that entity be the NSA). I also have to trust that they will continue operating this service and manufacturing devices that can connect to it beyond my death, which would be a lot of blind faith I don’t have.

Instead, UAD needs to be open source. I don’t trust Apple or Google or Dropbox or Microsoft to prioritize longevity and accessibility to my archive beyond me being their paying customer. I would certainly find it cumbersome to move from one system to another, and even if my archive should endure, the APIs and data structures used to store my writing would be proprietary and not easy to access if it was even still around.

An open source protocol and storage format would at least be well-documented and freely inspectable for posterity. And, since anyone could use and run it themselves, it would be more secure against state-level privacy invasions. It would also have a higher likelihood of security, particularly if it was a system that automatically updates itself to the latest security releases.

Durable

If everything else I’ve said sounds unfeasible, this is actually the hardest problem when it comes to a digital archive. Everything else can be solved with today’s technology (if not with today’s user experience). In a non-networked system, your archive is only as good as the durability of the physical medium encoding those 0s and 1s (and also your backup strategy since those media will fail).

And, since I’ve posited a cloud service as the most viable way to have a UAD system today, the utility-based model of all cloud services I’m aware of also becomes problematic for longevity. If I die, how long do I have between someone going looking for my archive and my cloud provider simply deleting everything?

Not only this, but a fundamental weakness of a cloud service is that it’s usually a single point of failure. What happens if the cloud provider goes out of business? Or their data centre is drowned by a tsunami? The archive may be found in whole or in part on discrete devices, but the fact that there’s only a single canonical copy living on a rented server is concerning. We probably need something much more federated like Bittorent.

Conclusion Without Concluding

This has mostly been focused on requirements and problems, not on what a real-world solution might look like. I’m probably missing many critical things. The problem here is that there aren’t any business interests aligned with all of these criteria. That would ordinarily mean the desirability of regulatory involvement, but it’s hard to imagine that keeping pace with the ever-changing state of technology.

I look forward to continuing to explore this space, if only for my own archival peace of mind.


  1. While I’m personally focused on text, photos and videos are probably even more important to preserve for most people. 
  2. On the iPad I use an external keyboard. I often find that iOS’s poor multitasking lessens the inherent distractability of writing on a networked computer. 

WordPress Default is proudly powered by WordPress

Entries (RSS) and Comments (RSS).