Thursday, 14 July 2011

Populating OJS from EPrints

Now that a full complement of Amicus Curiae articles has been loaded into the SAS-Space repository, I have been looking at ways to populate the OJS database automatically using the metadata available in the repository.

We are fortunate, as ever, that EPrints provides a wide range of export formats for individual item records and for sets of records. On the Amicus Curiae Collection page, we can see that EPrints gives us the option to export the metadata for the whole collection as a bibliographic citation (plain text or HTML), in formats for reference management software (Reference Manager, BibTex, EndNote) and in several other bibliographic data formats, including Dublin Core and METS.

However, I've chosen to base our process on the EP3 XML format of EPrints, which I've worked with before (when we migrated SAS-Space from DSpace to EPrints). It is the native EPrints export/import format, and arguably contains the most faithful serialisation of item metadata in the repository.

I've now created an XSLT stylesheet that transforms the EP3 XML for the Amicus Curiae collection into the "native.dtd" XML format which is the native import/export format for OJS. The biggest challenge in XSLT was grouping the journal articles by issue number, as required by the OJS native format, but once I'd found a way to do that, the rest is just fiddling about, as it so often is with metadata mapping.

Once the EP3 XML is transformed to OJS format, then, with a Journal already defined in OJS, we can use the OJS import function to import a complete set of issues, each containing its full complement of articles. It's also possible to include a cover image for each issue (if one is available), and the article in PDF form can either be embedded in the XML using Base64 encoding, or linked-to using a URL. Since our articles are already online in the SAS-Space repository, I used the URL option. (It seems that this imports the object into OJS filestore: we will investigate whether it's possible to prevent this, and have the online journal simply link straight to the item in SAS-Space.)

At the moment the XSLT stylesheet is working for our purposes, but offers the intriguing prospect that it could be enhanced to work over any result set in an EPrints repository, and made available as an EPrints Export Plug-in. This way, anyone wanting to quickly assemble, or reassemble, an online journal in OJS, can do so from articles deposited in a repository.

This could be an attractive scenario for anyone trying to retrospectively assemble an online journal from scans of a printed journal: once the materials are deposited in the repository (with sufficient metadata, of course, and, ideally, OCRed), then the data needed to implement a fully working journal in OJS is only a click away.

Existing OJS journal managers might even choose to manage their deposit and review workflow using the repository, and export to OJS when ready. This project gives us an interesting opportunity to compare the two approaches to item submission workflow, and I hope we'll be able to report back on that later.

2 comments:

  1. Impressive stuff; I'm sure it will generate a lot of interest. I'm particularly intrigued by the idea of using repository workflow as an alternative to the OJS one. Intuition says a system designed to do a single job, such as managing reviews, ought to be better suited to it. But not everyone finds OJS that easy to work with. I'll be watching out for your findings.

    ReplyDelete
  2. Thanks Kevin. Workflow's an interesting issue at the mo - you might have seen my posts about Imma's problems with workflows in DSPace (on DA Blog or at FAO.

    I'm quite excited by the possibilities of dynamic export of repository datasets (objects included) in a variety of XML-based formats. Here we highlight OJS, but the delivering search results as EPub is also attractive and viable. First, though, we'd have to find an acceptable OS solution to the problem of turning multi-column PDFs into text that doesn't look like William Burroughs wrote it. Suggestions welcome!

    ReplyDelete

Note: only a member of this blog may post a comment.