UPDATE, February 18, 2015: Our January 2012 post linked to an article on opensourcearchiving.org. As the content on that domain is no longer available, we are reproducing the article below. Our original post is at the very bottom of this page.
The City of Vancouver Archives has been contributing to the development of the Archivematica digital preservation system for the past several years and we have just started using the 0.8 alpha release for production. This is an overview of why we got involved and where we are now.
Who we are
We’re a municipal archives, part of the City of Vancouver government. We follow the Canadian total archives concept of acquiring the records of both government and the private sector, and a wide variety of media. Our holdings include textual records, still and moving images, audio, maps and plans, and documentary art. We also have born-digital materials, the most significant being the records of the Vancouver Organizing Committee for the 2010 Olympic and Paralympic Winter Games (VANOC). We have the first Games records that are mostly digital, with more than 25TB acquired. We had been looking for a digital preservation system for many years, and in 2008, thanks to funding from the Olympic Legacy Reserve, we were able to take action.
What we were looking for
We needed a digital preservation system that could:
- be shown to comply with the Open Archival Information System (OAIS) ISO standard
- ingest a variety of types of born-digital objects; existing systems were largely concerned with only digitized files
- store objects securely, with preservation metadata
- address preservation planning, a function ignored in most systems we had considered
- provide sufficient logging to demonstrate what has been done to the objects
- be entirely open source so that we would know exactly what occurred at each stage, and to ensure longevity and exportability of the system
- be flexible and able to add features as digital curation best practices develop
- be scalable, to accommodate the ingest of large acquisitions
With other interested partners, we worked with the lead developers, Artefactual Systems, Inc, to bring Archivematica to a production release, posting our work on a wiki.
Artefactual uses an agile software development approach: start with something and keep improving it, but don’t wait to put together the perfect system. This allows for the flexibility to keep up with developments in the expanding field of digital curation.
Functional requirements: start with OAIS
In 2009, we turned the OAIS reference model into use cases, which were the basis for UML activity diagrams showing a more practical workflow. The UML diagrams were compared to the InterPARES Chain of Preservation model and gaps in arrangement and appraisal were identified. Metadata requirements were based on PREMIS and METS. Archivematica was designed to be independent of the storage or access system chosen by the institution.
Using microservices
Rather than build a full-service system from scratch, existing open source tools were integrated to provide microservices within an Ubuntu operating system. Each microservice can be upgraded, replaced or repositioned in the sequence. For example, the File Information Tool Set (FITS) developed by Harvard University performs identification, validation and metadata extraction.
The system requires approval from the archivist at key points. The dashboard shows the progress of each SIP through the various microservices and signals when a decision or approval is required. Microservices may be customized to comply with institutional policies; we have, for example, chosen to provide high-resolution access copies of photographs and video.
It became obvious that the system needed, in some ways, to mimic archivists’ workflows for processing large analogue acquisitions. In particular, the archivist must be able to appraise the digital materials at several different points in the process. Appraisal is iterative—decisions about suitability and context may need to be reconsidered and refined—so several opportunities are provided.
Our current configuration
We are running our access system on a virtual machine accessible on the City’s network. We are still running the preservation system (Archivematica) as we have throughout its development: on a Linux PC at the Archives, in a Local Area Network (LAN). In 2012, we’d like to install Archivematica as a virtual machine on the City’s network server so it will communicate with our access system and preservation storage more easily.
Beyond OAIS
In the OAIS model, a Submission Information Package (SIP) is acquired from the Producer and, during the process of Ingest, it is turned into one or more AIPs (preservation packages). It sounds so straightforward! Ingesting the carefully created and organized products of a digitization program can be pretty simple. Born-digital records, however, rarely arrive in the archives as ready-to-ingest SIPs. They can be the digital equivalent of liquor-store boxes of analogue records: a jumble that needs to be sorted out. There is a need for software to help archivists inspect and organize the digital objects into SIPs ready for Ingest.
Archivematica 0.8 includes a pre-Ingest Transfer area for SIP creation. This enables logging of the original directory structure, and in the next release will allow for full-text indexing of text files and filetype visualization, for appraisal purposes More, including incorporating forensic tools from the BitCurator Project, is planned for future releases.
It’s an alpha release: are we crazy?
We wouldn’t deploy alpha software in production without careful thought. We’ve been involved in the development of this system since the beginning, so we know which functions are essential for preservation, and exactly how mature they are. At the 0.8 release, the critical path of Ingest—normalization, creation of METS and PREMIS metadata, and packaging of the AIP—all work for the digital objects we currently need to preserve. In addition, DIPs (access copies) are created and uploaded to our access system automatically. We are starting Ingest of the VANOC records and our digitized resources right now.
What’s ahead
Archivematica currently integrates with our chosen access system, ICA AtoM, and there is interest in integrating it with Collective Access. Work is in progress to allow Archivematica to ingest exports from DSpace. As partners step forward with requirements and funding, integration with other systems will be possible. Planned features for future releases are documented in the development roadmap.
We perform some tasks manually now that should be automated in the future, but we are happy to be able to start storing our digital holdings safely. We look forward to the next release, the first beta, in late spring of 2012.
Original post on AuthentiCity
For anyone who is interested in Archivematica, the digital preservation system we are helping to develop, we have a summary blog post on the development so far.
It’s at opensourcearchiving.org, the blog of the Open Source Committee of the Association for Moving Image Archives. We’ve been contributing to this blog, and we hope it will become a useful resource for anyone looking at open source software for archival use.
Dear Vancouverites
I was looking for the very detailed blog about loading packages to archivematica, but instead found something about basketball. Is is it possible your blog link has been hijacked?
Lise
Thank you for pointing this out. We’ve put the entire content of that blog post here. Please keep in mind it was current as of January 2012.
How will Archivematica be integrated with Collective Access? Will CA be used for front-end publishing only or will it replace AtoM?
We are considering building CA for archives for various communities but are stuck with how to ensure it can still be compatible with Archives Canada standards and making sure digital preservation tools like Archivematica will work with CA.
When we wrote this post there was some interest in the heritage community in Collective Access and Archivematica. We didn’t mean to imply that we use CA or are planning to; we only use AtoM, and have no plans to deviate from that.