Digital archaeologists

17/Jun/2015

Just because a historical artefact has been uploaded doesn’t mean it won’t disappear. A Swiss project is aiming to save research databases. By Daniel Saraga

(From "Horizons" no. 105, June 2015)

​The Web never forgets”. At least, this is what experts warn us about the overlap between the Internet and our private lives. But actually nothing could be further from the truth. The continual changes in digital media and file formats are leaving in their wake veritable mountains of increasingly illegible information. And if nothing is done to save it, it will all be consigned to oblivion.

It’s actually scientific research that’s bearing the brunt of this problem, as its favourite form of expression is the database. And ageing is a very quick process for databases. Programming languages quickly become obsolete, and operating systems are soon incompatible with new servers.

“It’s a real problem”, says Lukas Rosenthaler, head of the Data and Service Center for the Humanities (DaSCH), a project that aims to guarantee the longevity of results from humanities research (see “Save the saviour”). “Once a project has come to an end, and its financing with it, it’s very unusual for researchers to continue maintaining the infrastructure. And an inaccessible database is unusable. Not maintaining it is the same as destroying it and any scientific value it has. Paradoxically, this type of research artefact can be more fragile even than a published article”.

Scribes of the third millennium

Rosenthaler has been able to save one of the most important international databases on Greek mythology, the Lexicon Iconographicum Mythologiae Classicae, which was written off in 2009 after 30 years of work. “It was completely out of service, and the company that had programmed it had gone bankrupt”, says Rosenthaler. “We eventually had to make a pirate copy of the site, because all the passwords had long since been forgotten. Our work sometimes seems more like a form of digital archaeology…”. At any rate, it gives research results a new lease of life. Harvard University, for example, is now looking at integrating the Lexicon in its commentary on Homer, by using ‘linked open data’, a component of Web 3.0 that allows information to be linked online directly and dynamically.

Along with his modestly sized team, Rosenthaler, a former physicist and director of the Digital Humanities Lab of the University of Basel, has resorted to using semantic technology to create a generic platform able to structure data coming from a wide variety of different platforms. “I think that we’ll be able to bring on board 99% of the databases used in the humanities, and perhaps even some projects from biology. In three years, we have translated some 30 projects, stretching from Greek mythology to a collection of historical photographs of mountains”.

The DaSCH has taken the Open Archival Information System to inspire its work, systematically copying data and then re-transcribing it in a state-of-the-art format. It’s a difficult and expensive process and one that must be repeated regularly. It’s pretty much the digital equivalent of what monks did during the Middle Ages. “Most research groups don’t have the means to create stable tools”, says Rosenthaler. “Ideally, we would work with them from the beginning to create long-lasting databases that could then be easily updated and migrated”.

Daniel Saraga is the Head of Science Communication at the SNSF.

Save the saviour

The Data and Service Center for the Humanities (DaSCH) is dedicated to preserving digital archives, but actually finds itself on the brink of disappearing. “Since 2008, we have been fighting to implement a stable plat-form”, says Markus Zürcher, Secretary-General of the Swiss Academy of Humanities and Social Sciences and project pen holder. “Everyone supports this platform. The only thing that needs fixing is financing”.

As it happens, the DaSCH is still a pilot project, and it is close to the end of its trajectory. “In March 2015, we filed a request with SERI (the State Secretariat for Education, Research and Innovation) asking for 2 million francs to cover 2017 to 2022. We’re ready to keep financing it until 2017, because any interruption to the project would be very detrimental”. By way of comparison, some 30 million francs are spent annually on humanities databases.