Open document standards are crucial to guarding against software, hardware and operating system obsolescence, according to Michael Carden, digital preservation software manager at the National Archives of Australia (NAA).
The NAA is responsible for storing government records and archival materials in a range of formats and has formerly relied on what Carden describes as, “boxes and boxes of paper, and cool rooms full of negatives and prints, sound, film and video”.
However, Carden pointed out during a presentation to the Sydney Linux World Conference and Expo that “most records are now created digitally”.
He listed word processor documents, spreadsheets, digital images and other media files that the NAA has to manage and store. These media presented a raft of challenges: File formats and computer systems change over time, presenting problems for those wishing to access archived data.
“Digital information becomes obsolete quickly, and changes make information inaccessible”, Carden told delegates.
In addition, the NAA has a duty to manage analogue and digital records. Open solutions were the only way the organisation could continue to fulfil that task in the light of a rapidly-increasing volume of works in those formats, he said.
Carden heads up a team that has built an open source document conversion and preservation system called XML Electronic Normalising for Archives (XENA) that relies on open standards to guard against changes in operating systems and file formats over time.
“XENA relies on a plugin architecture to convert popular file formats into XML and metadata”, Carden explains. “It then stores the original document as well as the converted form side by side. For example, it calls on OpenOffice to ‘normalise’ relevant documents”. As a result, developers can take the software and adapt it to fit their needs. Carden is actively trying to recruit open source developers and encouraged attendees at the conference to “get involved” via Sourceforge.net.
XENA is built in Java, itself a proprietary language, but Carden insists that maintaining a CVS at sourceforge.net allows other developers to convert the code to other languages in the future.
“Because we’re an archive and because we’re looking after Commonwealth records, we have to assure the government and people that it’s fair, transparent and trackable.”