• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Interoperable identification infrastructure

Page history last edited by Andrew Treloar 14 years, 8 months ago

 

Repository Interoperable Identification Infrastructure

Draft Action Plan for International Coordination

Document information

Version: 0.3

 

Scope

This action plan describes work to enable/assist us to do a better job of identifying entities within our repositories as well as making connections across repositories. For the purposes of this action plan, these entities are limited to: Author, Institution, Repository and Object entity (eg research paper, dataset or collection).  We know that we will have researchers contributing to an increasingly complex and diverse mesh of stores (including institutionally-supported, discipline-mandated, and publisher-hosted). We also know that we need to work on the underpinnings that will enable us to stitch these together (possibly via something like OAI-ORE or FRBR) and support a range of useful services. Note that the purpose of the workshop is not to debate competing information models, but to focus on what concrete steps we can take to improve the identification of entities in a coordinated way. However, we need to be aware that some of the interoperability challenges will come down to divergent information models. 

 

NOTE: This action plan does not make any explicit recommendations about the particular identifiers to be used in each instance; this will need to be the subject of work under this action plan.

  

Benefit

International coordination to set up an effective interoperable identification infrastructure for repositories (and other web entities) will enable:

-         researchers to navigate and interact with other people and systems, with a level of reliability that enables those interactions to be richer than possible now

-         publishers to have an infrastructure that includes the DOI-based systems currently used, but that also covers other entities related to the papers and datasets they publish

-         research managers or funders to have a reliable set of assertions about the relationships between papers, authors and projects; that is, those assertions are explicit and can be machine processed[1]

-         service providers and repository managers to be able to build much richer services for their users.

  

Background

See Alma’s briefing materials, in particular these:

·        https://wiki.jisc.ac.uk/display/digitalrepositories/Author+identification

·        https://wiki.jisc.ac.uk/display/digitalrepositories/Institution+identifiers

·        https://wiki.jisc.ac.uk/display/digitalrepositories/Persistent+identifiers


Proposal 

 

 

#

 

 

Activity

 

 

Cost

 

 

Timescale

 

 

Complexity

 

 

Who

 

 

Notes

 

1

Undertake a significant round of workflow/usecase modelling, building on significant work already undertaken, and combined with iterative prototyping, to identify “good enough” solutions for a range of stakeholders.

6 person months

In parallel with other activities; total of 24 months elapsed time

Distributed activity across different domains, single place to share results

·         Any whose funders wish to support this

·         incidental activity in existing projects/services

·         dependent on people with the right Agile dev/reqts skills to do this well

2

Review and update existing relevant mindmaps provided as input to the workshop (and keep up to date for life of project as a resource)

1 person month

Short

Single place to share/maintain results (in coordination with other activities)

Alma Swan (funded through Sherpa)

May need a way of pivoting/refactoring existing mindmaps to align with this program of work

3

Repository identifiers: build on things like DOAR and ROAR to produce a registry of available repositories. This would need to accommodate both open and closed access repositories, the ability to refer to things not in the registry, some sanity-checking on entries, temporal scope, and automated pings to detect ‘repository death’

12 person months

12 months

·         Probably mix of centralised and federated solution

·         Hub and spoke model?

Europe TBD

·         Build in an incremental way

·         Need benefits from inclusion

·         Draw on existing lists from other domains

4

Organisation identifiers: by geographical region, need to deal with temporal scope changes

12 person months

12 months

·         Probably federated solution

·         Hub and spoke model?

Everyone who cares

·         Many to many relationship between organisations and repositories

·         Base on existing lists (incl. DNS)

·         Governance model required

5

People identifiers: develop a people identifier collection service (both human and machine queryable) to enable people to create equivalence/non-equivalence assertions between a subset of their different digital identities, and to store these equivalences

36 person months for a prototype, plus user testing and marketing

18 months

Could be federated or centralised

·         SURF?

·         And anyone else who cares

·         Builds on existing author identifier systems

·         May be benefits in parallel approaches as a risk-mitigation strategy

·         Support for optional semi-automatic assistance with identifying possible candidate identities

·         Add different personas in version 2.0

·         What about authority files?

6

Object identifiers – phase 1: recognising the existence of a wide range of existing object identifier schemes, provide a simple object equivalence service. In version 1, this would operate at the level of FRBR Manifestations, and enable the ability to say (for instance) “the object in repository A identified by this Handle is the same as the object in repository B identified by this ARK”. Service would be queryable by both machines and humans.

12 person months dev time, plus subsequent marketing

9 calendar months

Could be federated or centralised

·         Potential for JISC to take this forward

·         Also links with DRIVER and other national activities

·         Motivated primarily by citation tracking, enhanced publications,  usecases

 

 

7

Object identifiers – Phase 2: Move to Expression level

?

 

 

 

 

 

 

·         Work or Expression version of this is more like people equivalence service

·         Build on RIDIR, and VALREC projects, as well as plagiarism detection

·         Other types of relationships could be picked up here

 

 



[1] Which is not to say those assertions are necessarily ‘valid’, that is, true.

 

 

 

 

 

 

 

Comments (6)

Chris Rusbridge said

at 9:37 am on Apr 8, 2009

I'm a little worried about repository identifiers, when it seems to me we don't really know what a repository is. Maybe you can finesse this; X is a repository if Y who owns/controls X says it is a repository. JISC had the working definition: "a repository is a managed store of content that enables sharing of that content, so the key words are ‘managed’, ’sharing’ and ‘content’. If a filestore (etc) does this adequately for the intended users then fine". On that basis, for example, an eJournal collection managed by OJS is a repository. This is fine, but unambiguous identifiers for ambiguous entities somehow represent a problem for me!

Peter Burnhill said

at 11:52 am on Apr 8, 2009

I suspect that your worry is indicative of something more fundamental, applying as much to 'repositories and preservation'. Perhaps this is the place to discuss it as identification requires us to know what we are identifying, as you state. I detect a lot of self-obsession with repositories. I confess to having shared at least five minutes of that obsession (in a managed, content-focused way of course) when (a) working out the essential features of Jorum, the Depot and ShareGeo, in contrast to Digimap, NewsFilmOnline or even Suncat (say) - more about which, as you know, is at edina.ac.uk . These all manage a store of content had allows others to use that content. The terms and conditions of licence apply for onward sharing, but that surely is orthogonal to the definition of repository. IMHO, a repository is a repository if and only if it supports a minimum of two use communities, by running two (and maybe three) basic services: an ingest service for a use community of depositors of content; an access service for a use community of 'users' of that content. The third service is some sort of keep-safe service, which minimally warrants the relationship between what is deposited and what can be extracted for use. Of course any such repository is likely run (owned/controlled) by an organisation, and it is through that organisation that any given repository gains other qualities. And of course, it is possible for that responsibility for a given repository (if named, scoped and identified) could pass from one organisation to another. As analogy, a journal is a managed store of content, the responsibility for which can pass from one organisation (typically a learned society, university or commercial publisher).


Peter Burnhill said

at 11:55 am on Apr 8, 2009


But to return to what I think is the larger confusion that we should try to avoid, with identifiers and with other matters such as preservation: that is, there is life outside the repository, and even of the virtual walls of repository space. Typically, although by no means exclusively, I am referring to the world of scholarly communication & publication. Here I think we have a choice. Do we (who live inside the university and research world) attempt a universal scheme for defining / naming / identifying objects being manufactured and exchanged within the university and research world or (maybe and/or) base that upon the existing domain-specific identifiers - including the Internet-related adaptations? Thinking also of audio-visual material, and some of the industry surrounding that, they too have domain-specific schemes for defining / naming / identifying objects.

keith.jeffery@STFC.AC.UK said

at 9:50 am on Apr 9, 2009

My view remains constant:
1. any universal centrally-registered unique ID scheme is bound to fail under its own weight
2. therefore we shall have a pleethora of schemes for each object
3. the problem is to identify each object uniquely and unambiguously (and thus allowing disambiguation)
4. the only way to do this is based on values of attributes of that object (basic relational theory)
5. this means choosing appropriate attributes to give maximum chance of uniqueness (like differentiating species and individuals in biology)
6. and producing apporpriate software to disambiguate then give the object a unique id for purpose P and time T (both purposes and time change and may affect the relative weight given to disambiguating attribute values)
Keith

Chris Rusbridge said

at 10:20 am on Apr 9, 2009

@Keith, perhaps the RDF lesson, given the truth of your proposition 1, is that everything is declared relative to some namespace. That's how the Shibboleth stuff works as well (in a different way). My CV asserts that various CRs who could be (or maybe once were) identifiable at various institutions, as past student, past employee, current employee, current member etc. Unfortunately much of that information & assertions are private. But my identity as an author in D-Lib Magazine, Ariadne, IJDC, ERA and whatever the Glasgow repository is called these days, is public, and we ought to be able to build on that. I'm guessing that's what item 5 in the plan above is saying!

Alma Swan said

at 11:01 am on May 4, 2009

New report on identifiers from the RLG/OCLC: http://www.oclc.org/programs/news/2009-05-01.htm
From the press release:
This report identifies the necessary components of a "Cooperative Identities Hub" that would address the problem space in the research community and have the most impact across different target audiences.

The fifteen members of the RLG Partnership Networking Names Advisory Group developed fourteen use case scenarios around academic libraries and scholars, archivists and archival users, and institutional repositories that provide the context in which different communities would benefit from aggregating information about persons and organizations, corporate and government bodies, and families, and making it available on a network level.

The report summarizes the group's recommendations on the functions and attributes needed to support the use case scenarios.

You don't have permission to comment on this page.