• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Citation Services draft project proposal

Page history last edited by Alma Swan 14 years, 10 months ago


This proposed project is about improving the ways in which citation data relating to open access research papers are identified and shared.


The importance of citation data to the research community has grown in recent years as research evaluation has become a systematic exercise and in some cases carried out on a national level (UK, Australia).

All the time citation statistics and h-factors were a matter of significance only between a consenting individual and the promotion/tenure committee, the collection of citation data was a matter that could be left to (and purchased from) one or two commercial organisations such as ISI (Thomson Reuters) or Elsevier as the need arose. Now that an increasing number of national governments expect that citation data should figure in the periodic evaluation of every researcher’s work and every institution’s impact, it is important that institutions and individuals can not only catalogue every one of their research outputs (the Open Access agenda), but also capture the detailed evidence of the impact that those items have on the community. Although impact can be evidenced in many forms, including levels of repository downloads of articles, it is the citation of research outputs by a third party that still carries most weight.



The proposal is to carry out to do the following:

•    In collaboration with Microsoft Research, to develop author support tools to improve and assist the deposit of material in repositories, including reference lists from articles

•    To develop an API that can be connected to repository software programs which will enable the recognition and analysis of items cited by articles in repositories

•    To enable the extension of the capabilities of the Citebase software  

This will be an international effort involving a team of collaborators in Europe, North America and Asia.



The scope of the project covers not only the sharing of citation data (once gathered), but also the ability to accurately recognise citations in research, scholarly and technical outputs in repositories and on the open web.


The research papers that will be used for the study may be in repositories, open access journals or elsewhere on the web, just as they normally are.  The citation data under study may relate to the objects those papers cite, or the objects citing those papers (backward or forward citation).  These citation data may also be held anywhere on the web.



The project will produce services that will serve the interests of a range of stakeholders:

•    Researchers, who will be able to track their own research impact and manage their online reputation

•    Research managers, who will be able to use the new tools to assess and better understand the research programmes they manage

•    Research funders, who will be able to monitor the research work they are supporting and to develop measures of return on investment

•    Publishers, who will have access to more accurate information about the place of their publications in the overall research arena

•    Librarians, who will be able to build innovative services to improve the ways in which research outputs are discovered, accessed and used



The project will deliver:

i) Author support tools: these will be tools for use in the main editing environments such as MS Word/Endnote and LaTeX/BibTeX (used mainly in engineering, mathematics, physics, etc)

ii) Support for manual reference list editing and deposit in repository metadata and the general repository deposit workflow

iii) An OAI-PMH (Open Archives Initiative Protocol for Metadata harvesting) citation schema

iv) A BibEx reference list plugin (written in Java or PERL programming languages) which will target each repository and extract bibliographic data from each research article (which may be in PDF or HTML or similar format)

v) A large testbed of representative documents collected from repositories by arrangement with repository managers

vi) Reference deconstructor software, which can break down a reference list from an article into basic constituent parts for analysis

vii) One or more basic services (citation databases) which can collect, combine and disambiguate (recognise eliminate duplicate) citations

viii) Exemplar advanced value-added services: examples of such services might be citation graph visualisation, network visualisation and trackback (track back through the literature using citation links) services

ix) Auditing and quality assurance services (these will be critical to any application of the process for formal research assessment)



Some prior work has already been done in this area. Some basic citation services exist that work on particular parts of the research corpus: for example CiteBase (currently working on the physics literature in arXiv) Citeseer (which works on the computer science Open Access literature) and OAIster, the Open Access harvester and search engine.

Some of the work here (i.e. areas vi, vii and viii) will build on one or more of these existing services.

The table here gives an overview of the work that is to be done, which parties might take on the various work elements and some of the resourcing requirements. Shaded areas indicate tasks already completed or underway. Unshaded areas represent work packages that still need to be executed and for which funding is sought.




Technical partners so far committed to participating in the work are:

  • School of Electronics & Computer Science, University of Southampton, led by Dr Leslie Carr and Dr Tim Brody
  • Humboldt-Universitat zu Berlin, led by Prof. Dr. Peter Schirmbacher
  • Institute for Science Networking Oldenburg GmbH, led by Prof. Dr. Dr. Eberhard R. Hilf
  • Open Access.se
  • FECYT (Fundación Española para la Ciencia y la Tecnología)
  • SURF
  • Microsoft Research
  • OCLC Research


Other partners are very welcome. Potential partners should contact Alma Swan to discuss their participation (a.swan AT talk21.com).



Comments are now sought on this proposal. Please add them below. You may need to join this wiki to add comments if you have not already done so. The link for joining is on the front page.

Comments (2)

MF said

at 6:11 pm on Jun 27, 2009

this can achieve something very important.
I am not certain about what you mean with OAI-PMH citation schema. Like context sets ? like http://dublincore.org/documents/dc-citation-guidelines/ ?
My understanding is that the approach to citation extraction has been very discipline-oriented up to now (slac spires, citec ...). Isn't it for efficiency?
It would be nice to have the citation extractor as a service as well for instance to check one paper on the fly for instance at the moment of ingest.

Chris Rusbridge said

at 2:30 pm on Aug 7, 2009

I wonder whether factoring David Shotton's Cito ontology into this would be a good idea? I believe it includes capturing the context of a citation, eg endorsement, contradiction, further information, etc. See http://imageweb.zoo.ox.ac.uk/pub/2008/publications/Shotton_ISMB_BioOntology_CiTO_preprint.pdf

You don't have permission to comment on this page.