Who is preserving electronic journals? And how would you know?

You’d think it would not be too hard to find out who is making the digital preservation of a particular electronic journal possible. After all there are only three main contenders: Portico, LOCKSS and CLOCKSS. All you would need to do is download and compare this lists of titles that they claim to be archiving. Sadly, as elsewhere in the world of journal subscriptions, nothing is as simple as it ought to be.

Michael Seadle, in preparing for the LuKII (LOCKSS-und-KOPAL-Infrastruktur-und-Interoperabilität) project in Germany, tried this very activity. He was interested in answering 3 questions put to him by project members:

  1. Duplication: How many journal titles are in both LOCKSS and Portico?
  2. Small publishers: What is their relationship with LOCKSS/CLOCKSS and Portico?
  3. Large publishers: What is their relationship with LOCKSS/CLOCKSS and Portico?

You can download titles lists from LOCKSS (csv, pdf, odf, xls), CLOCKSS (pdf, xls) and Portico (xls). Different formats of the data are available, but at least Excel .xls format is common to each source. A further problem once you have the data is that titles are inconsistently expressed across the different agencies, making exact matching difficult to achieve. Michael wrote perl routines to strip out strings in brackets to make the titles more likely to be identical.

There are interesting conclusions found from matching the lists. There are large areas of overlap between Portico and the LOCKSS/CLOCKSS team. Seadle expects this overlap to increase as publishers tend not not drop out of preservation schemes once they have joined up. However 38% of Portico titles are not in LOCKSS/CLOCKSS and 31% of LOCKSS/CLOCKSS are not in Portico. LOCKSS has real strength in its coverage of small publishers, while CLOCKSS, using the same technology, is much stronger with large publishers. Portico is not unfriendly to small publishers and publishers, on the whole, are trusting LOCKSS/CLOCKSS and Portico equally.

One further conclusion from this research is that the downloading and data cleaning would have to be repeated for everyone wanting to compare the preservation status of electronic journals. New titles are being regularly added to each agency. that this is not straightforward is likely to hold people back from asking important questions about the long-term digital health of their investment in journals.

So the move into beta phase (on 25 April 2011) for the EDINA PEPRS service that collects and displays this very data will be important for anyone wanting to repeat this kind of exercise for themselves.

Michael Seadle, (2011) “Archiving in the networked world: by the numbers“, Library Hi Tech, Vol. 29 Iss: 1, pp.189 – 197

About Philip Adams

Senior Assistant Librarian at De Montfort University. I am interested in digital preservation and the use of data to measure a library's impact. All comments own.
