• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Dokkio Sidebar (from the makers of PBworks) is a Chrome extension that eliminates the need for endless browser tabs. You can search all your online stuff without any extra effort. And Sidebar was #1 on Product Hunt! Check out what people are saying by clicking here.


Deposit workflows and environments

Page history last edited by n.jacobs@... 13 years, 4 months ago


  1. Collaborative Authoring: a researcher working on a collaborative project is writing a joint paper with colleagues some of whom are in/outside the project and in/outside his university.  The paper is created, edited and managed (eg, versioned) in a collaborative environment to which they all have easy access.  At various points, on agreement with all the authors, the paper is marked for submission to a journal, to the appropriate repositories, etc.  The relevant repositories infrastructure includes those components that enable any such tool to work (not the tool itself).
  2. Online CV: researcher has to comply with their institutional and funder open access requirements, and in doing so can manage and update their online CV generated from their local repository (among other sources).  Researchers or their agents put the paper into (a) relevant repository/ies.  They check or (when necessary) add metadata, and add one or more relevant files.  Most if not all metadata is automatically added to the record from the depositor's workflow, external context, web, elsewhere.
  3. Reference Harvesting: repository managers run a tool that identifies references to research papers that would be in scope for their repository, create the best records they can (including full text where possible) and offer these to the author(s) for checking.  The relevant repositories infrastructure includes those components that enable any such tool to work.
  4. Research Output Registration and Management: a researcher has to report (register) her research outputs (publications, datasets, sourcecode, computational models, workflows) in the institutional RMS - Research Management System (often referred to as a CRIS - Current Research Information System). In the same workflow she may (or is obliged to) deposit the full research output in the RMS's integrated repository with the appropriate degree/form of Open Access. Thus the researcher (or her agent) only has one system to interact with, saves time and retains focus on research problems. Which infrastructure elements (data models and formats, version and access management mechanisms, RMS-GenericRepository integration architectures, etc.) are needed to realise this in an open way supporting global reuse as much as possible? 
  5. Local Deposit - Global Propagation: a researcher registers and deposits his publications in his "local" repository or Research Management System, where "local" most likely means institutional, but may mean departmental or research team. While doing so, he may select one or more other repositories (institutional or subject specific) that he wants the metadata and full text propagated/copied to. Likewise he may choose a journal (overlay or traditional) to which he wants the paper submitted for peer review and thus certification.  Which infrastructure elements are needed to realise this in an open way supporting global reuse as much as possible?
  6. Deposit from publisher: Publisher A already deposits authors manuscripts to PubMed Central using the NIHMS bulk submission pathway. Publisher A is willing to extend this to institutional/subject repositories, but needs to have a standard export from its system that can be used for many repositories. This should be based on the bulk upload information package (link goes to a winzipped tar file) used for NIHMS.  The publisher has an additional challenge in that DOI and publication date (necessary for embargo) are not known when the export is made and would need to be provided at a later date and matched to the manuscript. (submitted by Grace Baynes, see comments below)


Which infrastructure components will be needed to make this work effectively?  (Note - the 'Ingest' briefing materials  are relevant here)

  • System to system repository deposit - eg. from collaborative platform to repositories, from RMS/CRIS to repositories, from (institutional) repository to (subject) repository, from publisher to (many) repositories.
  • Either [i] agreed basic protocol for fine-grained deposit negotiation, eg:
    • depositing system: "I have one of these"
    • receiving system: "I have a possible earlier version", "I only need the MODS metadata", "what preservation commitments are we making to each other?" etc
  • ...Or (ii) repositories that can deal with all this internally on receipt of a package, eg NIHMS bulk upload package.
  • Shared metadata services
  • Metadata exchange services (exchange formats) between repositories and RMS/CRIS
  • Name / factual authority services (eg, institution identifiers , author identification )
  • Automatic metadata creation/extraction services
  • interoperable persistent identifier infrastructure
  • SherpaRoMEO, a fully international service, comprehensive, reliable, up-to-date, that also acts as an open journal title authority list.
  •  An intermediary / network level "deposit service" that does all this???
  • Mechanisms (this should probably be broken down into components...) to allow for deposit locally with automatic propagation to a global location (such as the PDB) and vice versa (to ensure an institution has a complete-ish collection of its research outputs)


The lightweight protocol for deposit, SWORD , aims at 'lowering the barriers to deposit' and should be well suited facilitate some of the requirements above.

Action Plan Development

 We would now like your input to start developing the Action Plan for this area ahead of the workshop. Specifically, we would like you to:

  1. See if you think these are the most helpful usecases
    • Please bear in mind that we need to be focussed and can't just multiply usecases indefinitely - we need to work with usecases that relate to what our users want to do, and that will tease out the necessary technology components.
  2. Edit the usecase descriptions to make sure they capture the core of the issue
    • the temptation here is to either try for the too-general or the too-specific; try to tread a middle line!
  3. Start to identify the components that might be needed for each usecase
    • It would be useful if we could start to converge on a small set of such components, so it might be a good idea to have a look at the other themes as they develop to see if they have identified possible useful components

Mogens Sandfaer  and Andrew Treloar .


Comments from previous instance of the wiki



10 Dec

Leslie Carr says:

The first usecase (collaborative authoring) is realistically going to be achieved by emailling Microsoft Word documents between the authors. Google Docs are infrequently used, and email is likely to be used to handle synchronisation and communication anyway. I would also argue that submission to a repository is unlikely to be a co-ordinated and ngotiated activity, unless it is arXiv or PMC that is being targeted. I wonder whether an email-based submission process might be useful.


      12 Feb

      Andrew Treloar says:

      Well, we can focus on the world as it is, or the world as we want it to be. I agree that emailing Word files around is what people currently do, but that doesn't make it either right or the optimal solution. There are a number of problems that have been identified with emailing documents around: proliferation of copies, inability to know what the latest version is, access problems if one is away from one's email and using POP. Using something (repository, Plone, CMS, wiki, BSCW) to store the canonical version of the document (and ideally all previous versions) is a huge win.


10 Dec

Leslie Carr says:

It must be possible to get exemplars of usecase #2 from various universities


      10 Feb

      Helio Kuramoto says:

      Here in Brazil, we have an initiative developed by National Council for Development in Science and Technology (CNPq), fund agency, that implemented the Platform Lattes and so any reseacher to get funds for your research must submit electronicaly his CV and then the researcher must update his CV constantly.  The database of Platform Lattes has more then 1,100,000 CV. The information system developed for the Platform Lattes has distributed to other countries like, Argentina, Portugal, Colombia, Cuba. They integrate a network called Scienti. We would like to integrate Brazilian institucional repositories and Platform Lattes to update the records of CV by sending metadata from the research papers deposit by the researcher in the institutional repository of his university.


10 Dec

Leslie Carr says:

Two recent systems that deliver the functionality of usecase #3 are BibApp (from various DSpace developers; works with DSpace and EPrints via SWORD; see bibapp.org) and Symplectic (currently works with DSpace but being adapted for EPrints too via SWORD).


10 Dec

Leslie Carr says:

Use case #4 (CRIS) is really crucial because the CRIS community will be increasingly appealling directly to University Admin departments and we need to fit in well with whatever is going on. The problem is that it is the CRIS community who need to drive thi activity - we need to be better informed about CERIF etc.


16 Dec

Grace Baynes says:

I'd like to request/propose a usecase that involves publisher deposit of authors' accepted manuscripts in institutional/subject repositories. In summary, how can a standard export be applied to many repositories.


Publisher A already deposits authors manuscripts to PubMed Central using the NIHMS bulk submission pathway. Publisher A is willing to extend this to institutional/subject repositories, but needs to have a standard export from its system that can be used for many repositories. This should be based on the bulk upload information package (link goes to a winzipped tar file) used for NIHMS.


The publisher has an additional challenge in that DOI and publication date (necessary for embargo) are not known when the export is made and would need to be provided at a later date and matched to the manuscript.


It would good to explore how the SWORD protocol could be used or extended to apply to this usecase.  I'm happy to provide further detail if needed.



      12 Feb

      Andrew Treloar says:

      I like this idea, but am unsure why the publishers would be wiling to undertake it. What is in it for them?


            13 Mar

            Theo Andrew says:

            Nature Publishing Group has identified that it would like to adopt this kind of transfer. (see http://www.nature.com/press_releases/archive.html_) ._ Unsure of current status.


07 Jan

Chris Rusbridge says:

Authoring support was incorporated in the negative click research repository services ideas, and was also in the Ideascale discussion I mentioned elsewhere. I was always a bit worried about it, as it suggests a change of academic work flow, notoriously hard to achieve (cf Les Carr's first comment above). It got mixed reception on Ideascale; on the one hand the top-rated idea is directly associated (get the repository into the author's work flow), on the other hand the specific idea: "The repository should provide authoring support" got only net 3 votes (10 votes for and 7 against).


But I also wanted to draw attention to Susan Gibbon's work at Rochester, mentioned at a recent CNI, and referred to in http://digitalcuration.blogspot.com/2008/12/gibbons-next-generation-academics-and.html. Rochester have an excellent approach of using anthropoligical techniques to study problems, and in this case are also aiming to build authoring support for "next generation" academics, ie graduate students!


      12 Feb

      Andrew Treloar says:

      Re "The repository should provide authoring support", I suspect part of the problem is that this statement is ambiguous. Does it mean the repository should provide support for editing *within* the repository environment, or that it should provide support for authoring *outside* the repository environment. I can see why people might not be excited about the former (one more thing to learn, with deficient editing controls to <fill in favourite text environment here>). I am arguing for the latter. Edit it using whatever you like and the use the repository to support collaboration around others similarly editing it.


      Agree with Chris that Rochester has done some great work.



10 Feb

Keith G Jeffery says:

Re use case 4 this is indeed the territory of CRIS and the CERIF (see www.eurocris.org/cerif ) EU recommendation to member states.  Various organisations have implemented workflows for registering research outputs - see Netherlands (METIS/DARE in  NARCIS), Flanders (See FRIS), Norway (see FRIDA).as examples.  The link between CRIS and repositories has been demonstrated in various countries whether the repository is EPrints or DSpace (for example Norway has examples of both in its universities).  Most systems use the metadata in the CRIS for research evaluation, since it is usually better disambiguated than that in the repository.  CERIF provides formal syntaax and declared semantics which makes for reliable automated processing.  The Norwegians (and some others) import ISI data to try to pre-load the CRIS for publications but have an appalling error rate.  They use the CRIS for disambiguation.


12 Feb

Tim Brody says:

Another use case: deposit of large size items. HTTP isn't very friendly to large (multi-GB) deposit of objects. I'm after an "elegant" solution that will be user-friendly.


      16 Feb

      Andrew Treloar says:

      I agree that support for large data objects needs another use-case. Part of the issue here is that the whole standard repository HTTP upload/download metaphor just doesn't scale well to very large objects (network bandwidth constraints, HTTP timeouts, 'usefulness' of downloading an entire large object). So, lots to discuss (some of which Matthias and I have been talking about for a while in a Fedora advisory group context).


21 Feb

Adrian Stevenson says:

Agreed we don't want to end up with too many use cases, but it might be worthwhile at least quickly reviewing some of the scenarios we came up with for the SWORD project in case they enlighten this discussion. The key ones are outlined in the 'rationale for a standard deposit mechanism' at http://www.ariadne.ac.uk/issue54/allinson-et-al/.  We have a few more that I can provide if it seems worthwhile. We did discuss where/how items deposited using SWORD would fit into various repository's workflows at our SWORD2 kick-off meeting, but it was a tough nut to crack. I'll have a look back at my notes on that one in case there's anything useful there. It's certainly something we'd like to pick up again with SWORD.


Comments (0)

You don't have permission to comment on this page.