Powered By IdeaScale
Login / Signup

Repositories - communicating the idea

« Back To JISC Repositories

c.rusbridge

User Profile Image c.rusbridge
Member since : Jul-14-2008 (Verified)
14 Ideas, 21 Comments, 45 Votes

User Activity Stream

Ideas Posted

3) My repository aims for accessibility and/or usability of its contents for the long term (say greater than 10 years).
2) My repository aims for accessibility and/or usability of its contents for the medium term (say 4 to 10 years)
1) My repository does not aim for accessibility and/or usability of its contents beyond the short term (say 3 years)
A repository should be for content which is required and expected to be useful over a significant period. It may host more transient content, but by and large the point of a repository is persistence. While suggesting a repository should be a "full OAIS" has not proved acceptable to this group so far, investment in a repository and this need for persistence suggest that repository managers should aim to make their content both accessible and usable over the medium (rather than short) term. For the purposes of this exercise, let's suggest factors of around 3: short term 3 years, medium term around 10 years, long term around 30 years plus. Ten years is a reasonable period to aspire to; it justifies investment, but is unlikely to cover too many major content migrations.

To achieve this, I think repository management should assess their repository and its policies. Using OAIS at a high level as a yard stick would be appropriate. Full compliance would not be required, but thought to each major concept and element would be good practice.

This "idea" is to replace the "full OAIS" approach with something more realistic and achievable.
OK, I'll go the whole hog in relation to the RRS blog posts:

At a very basic level, the RRS should [be associated with] a Persistent Storage service. Completely agnostic as to objects, Persistent Storage would provide a personal, or group-oriented (ie within the institution) or project-oriented (ie beyond the institution) storage service that is properly backed up. There’s no claim that Persistent Storage would last for ever, but it must last beyond the next power spike, virus infection or laptop loss! It has to be easy to use, as simple as mounting a virtual drive (but has to work equally easily for researchers using all 3 common OS environments). Conversely (and this isn’t easy), there must be reliable ways of taking parts of it with you when away from base, so synchronisation with laptops or remote computers is essential. It should support anything: data, documents, ancillary objects, databases, whatever you need. It’s possible that “cloud computing” eg Amazon S3, the Carmen Cloud or other GRID services might be appropriate.
Again from the RRS blog posts:

We don't think about identity management as part of the repository, although a really annoying early experience of DSpace related to the requirement for a completely separate identity. This seems to have been overcome by getting the librarian to do mediated deposit for you, but I don't have the feeling that the repository is well integrated into the institutional identity system. It should be, but I want more!

I may see the RRS as a special case of an Institutional Repository (IR), but many if not most research collaborations are cross-institutional. This means that if there is to be support for cross-institutional authoring, there has to be support for members of other institutions to log in to your RRS. And this has to be seamless and easy, ie done without having to acquire new identities.

In addition, Researcher Identity should provide name control, that is, it knows who you are and will fill in a standardised version of your name in appropriate places. It should know your affiliation (institution, department/school, group, project and/or possibly work package). It might know some default tags for your work (eg Chris is normally talking about "digital curation"). However, this naming support must extend beyond your institution, so that collaborators and co-authors can be first-class users of other features. And it should relate to your (and their) standard institutional username and credentials; nothing extra to remember. This implies (I think) something like Shibboleth support.

This is getting kind of complicated, and verging towards another complex realm of Current Research[er] Information Systems (CRIS, mentioned in other ideas). These worthy systems also aim to make your life easier by knowing all about you, and linking your identity and work together. But they are complex, have their own major projects and standards, and have been going for years without much impact that I can see, except in a few cases. The RRS should take account of EuroCRIS and CERIF (see Wikipedia page) as far as they might apply.
We should at least have this on the table. I think repositories are good for preservation, but the question here is whether they should go much further than they currently do in attempting to invest now to combat the effects of later technology and designated community knowledge base change...
Another from the Research repository System (RRS) blog posts:

Publisher liaison is maybe controversial. But why shouldn’t the RRS staff (or your library) support you in dealing with publishers? The RRS wants your articles and your data, and should help you negotiate and reserve the rights so that they can get them. So publisher liaison would include rights negotiation, submission to the publisher on your behalf of a specific version, support through the editorial revision process, and recovery of metadata from the published version for the RRS records and your own bibliography, web page and CV. Naturally, deposit in the repository would be integrated in this workflow; you only have to authorise opening to the public, or perhaps a more restricted audience.
This is a refinement of the current top-rated idea, based on one of my blog posts on research repository systems.

Authoring support should include version control, collaboration, possibly publisher liaison, and be integrated with the repository deposit process. It does need object disclosure control, see below. Version control would support ideas, working drafts, pre-prints, working papers, submitted drafts undergoing editorial changes, and refereed and published versions. Collaboration support would need to include support for multiple authors contributing document parts, and assembly of these into larger parts and eventually “complete” drafts. It should also include some kind of multiple author checkout system for updates, something like CVS or SVN, maybe a bit WIKI-like. It must support a wide choice of document editor, eg Word, OpenOffice.org, LaTeX etc (I don’t know how to combine this with the previous requirement!).
Again, the Andy Powell idea. This one, I think, more about sharing, embedding, mashups. Think Flickr. Think sneep.
This is the Andy Powell worry; we have made the repository too much of a "special thing" operating under "library rules". Make it more like Slideshare. I'm going to express this another way...
If the repository is to become anything other than a final destination for public objects, then the user needs control over access. This control must be able to ALLOW access to the objects by colleagues, wherever they work, as well as prevent access by others.
Managing data can be a big problem. Any data that might, for example, become supplementary data in an article, needs curating. Help the user by providing facilities to capture and hold intermediate versions of the data, ad the final public version.
I guess this is the workflow idea again, but stated another way. Don't get too hung up on "workflows", as in the e-science meaning (kepler, taverna et al). This is about making the repository fit in what people are trying to do, eg write the article, keep multiple versions, share with their colleagues in other institutions...
Displaying 1 - 25 of 42 Ideas

Comments Posted

c.rusbridge 1 year ago
Hmm. I voted this up but then read it more closely. Yes, there are different services, and making available long term is different from accessibility now. BUT, accessibility tomorrow and the day after requires making available short term, which is part of making available medium term, which is part of making available long term.

By analogy, think of the term "not for profit". It means those nice friendly institutions as opposed to those nasty grasping "for-profit" institutions. But a not for profit that survives has to be "not for loss", and so they have to act in many ways like a "for profit".

Amy repository that wants to continue to make its resources available has to be "not for loss", which means it has to have some elements of preservation!
c.rusbridge 1 year ago
Good point, and I'm guilty of a bad case of jargon, my apologies. An OAIS is an Open Archival Information System, and is the technical name devised for a repository and services designed to support long term preservation, where "long term" means long enough for things like technological change, format obsolescence, and changes in terminology or knowledge base of the so-called "designated community" (ie the community that the OAIS serves) to become significant. It is described in an ISO standard, but it would usually be cited in its openly available form:

CCSDS (2002) Reference Model for an Open Archival Information System (OAIS). IN CCSDS (Ed.), NASA. http://public.ccsds.org/publications/archive/650x0b1.pdf

The problem is partly that if you commit to long term preservation and making your repository a "full OAIS" in this way, you may be increasing the cost of running the repository by significant amounts. There is also controversy over the extent to which the standard is merely a reference model (as it claims) that can be broadly interpreted so that even activities like data archives that have been in operation decades before the standard came into existence can comply with, versus a more prescriptive interpretation that some of the detailed parts of the standard seem to imply.

What I'm trying to say here is that a repository worthy of the name needs to think at least medium term, and therefore needs to think about the issues raised in the standard, without necessarily aiming for full compliance. Complete rejection of the implications of the medium and long term should not be an option!
c.rusbridge 1 year ago
I like most of what you write, but I really don't like the global knowledge waiting room label!
c.rusbridge 1 year ago
I'm in two minds here. I do hate the "metadata repository" approach, at least as currently implemented. The deal for me is, I want access to the stuff. If I think I'm on my way to it, then you frustrate me, I get annoyed. I was completely unable to find a single actual paper in So'ton's repository recently!

OTOH, I do agree that the models can differ. For a start, I've always felt there is a difference between the working paper repository and the post-print repository (which we tried to instantiate at Glasgow). We really need to find good ways of making such distinctions, specially if we are to make the repository do more. And data is again probably different, and perhaps best not mixed (at least not mixed as if it were the same).
c.rusbridge 1 year ago
I'm pretty disappointed that this one has dropped so low. OK, we are mostly being a bit picky about the exact wording of the "ideas", and this one may perhaps not be best expressed. I can understand that people find the "full OAIS preservation system" a bit hard to swallow. But in fact, much of OAIS is actually about the things that repositories do anyway: ingest, storage management, metadata management, dissemination. And repositories are definitely in the "persistent storage" space, if not in the "preserve for ever", cross my fingers and hope to die space.

At the very least we should agree that a repository is designed to provide continuing access in the near to mid term!
c.rusbridge 1 year ago
@lac, I think I disagree too, or at least partly. A CRIS is usually a but more in the admin space. The good thing is that this allows it to take advantage of information that's in the admin databases; things like proposal references and summaries, authoritative funder names, PI names, authoritative staff names. This should allow a repository to fill in much of the metadata for the depositor, do at least a bit of name authority. Might also help the institution and the PI by linking the output back to the grant, for reporting purposes.

The CRIS systems I've seen have tended to be more about the reporting of the articles (ie the metadata) than the content. To be fair, a lot of the repositories we see are also used in that way. But we definitely want the content.

So I think the CRIS idea is helpful, but the implementations may prove too cumbersome, given that they are integrating many local systems that are implemented differently in each institution,
c.rusbridge 1 year ago
Amber, I think you are right, but I read this as relating to assumptions about implementation architecture, not target market. A data repository might have different requirements than an OA text repository, and we should take those into account. But it's less important how the implementation is actually designed; filestore plus catalogue might be perfectly adequate implementation.
c.rusbridge 1 year ago
Owen, I think I'm more for the second of your phrasings than the first. Getting manuscripts to the publisher is easy, email (unless they have one of those web-managed deposit systems); it's the whole editorial process that is a pain. But I was trying to be a bit blue sky here. What would REALLY make your researchers more productive AND suck them into your repository? Librarians want the researchers content. Librarians know about (c) and licences and OA. Librarians deal with publishers (well, a bit). Librarians are moving from dealing with objects to dealing with information. Meanwhile researchers are getting more stretched, with more accountability and publishing demands, with ever larger teams, with ever wider projects, with ever more authors. So if you can help the researcher/authors manage their content during the writing process (mostly through a smart SourceForge-like system associated with the repository), AND provide them with some simple help through the publishing process, then maybe we could get to the sort of researcher support systems that might make an institution more productive.
c.rusbridge 1 year ago
It's not a bad thought, but I think it would only confuse further, unless we find a really good substitute. As the author of a number of unfortunate neologisms - think "virtual clumps" - I know how awkward it is to be saddled with something awkwrd. They do die in time, though.

It would be good if we could start using the repository in that sort of way, though. I think we have now won the battle over the "repository must offer OAI-PMH", and that's a good thing.

If we say "a repository is a managed, persistent way of making R, L and T content with continuing value discoverable and accessible", does that manage to combine the two senses? The discoverable bit is critical for data that are not self-indexed. (Mind you, it doesn't quite cover the non-disclosed content angle, that I think is important.)
c.rusbridge 1 year ago
Something weird happened here; I was given an opportunity to vote on this (which I thought I already had... it was at 2 and I voted up and it dropped to 1!
c.rusbridge 1 year ago
@paul, I think we should be generic rather than systematic in relation to the word "workflow". This isn't kepler/taverna territory; more it's a general attempt to move the repository upstream, so that it is more helpful to the user, rather than being an extra, downstream burden.
c.rusbridge 1 year ago
@rachel, I wasn't thinking of VREs when I wrote the original blog posts, mainly because I've never really grasped the functionality they might offer. I'll try to have a think about the relationship. My immediate gut reaction relates to something I wrote in the blog posts but maybe not here: I didn't want to propose a complex, monolithic system (and I tend to think a VRE fits in that bag) but something more component-like...

@paul, I think I see this persistent storage bit as being associated with rather than necessarily part of the repository. I think maybe here I was responding to the well-known poor quality of storage management in research groups, and wanting to add something to help, but maybe it's not really part of the repository it...
c.rusbridge 1 year ago
OK, I'm not sure if I've misread you, so it's possible we are in agreement here. You said: "That means identifiying a real need (not an ought), something that users feel that they are not currently achieving, and would like to. I have a suspicion that this might focus around the REF and the need to increase citations and have a way of reporting both output and impacts."

It may be just the examples in the second sentence, but they seemed to me the sort of remote, down-stream advantages we so often have seemed to be talking about with users. I wanted to get the repository to work for the user upstream, even before the research is published. Something that will actually make it easier to DO the research and WRITE the paper. This does tie in with your first sentence, of course...
c.rusbridge 1 year ago
Tom, I think I'm saying something different: "work for" is not the same as "need". Not that you'll get some benefits after deposit, but that your life will be easier before deposit...
c.rusbridge 1 year ago
Just a bit more from the RRS blog posts:

Object disclosure control is crucial to this [Research Repository] system working. Many digital objects in the system would be inaccessible to the general public (unless you are working in an Open Science or Open Notebook way). You need to be able to keep some objects private to you, some objects private to your project or group (not restricted to your institution, however), and some objects public. There should probably be some kind of embargo support for the latter, perhaps time-based, and/or requiring confirmation from you before release. And since some digital objects here are very likely to be databases, there are some granularity issues, where varying disclosure rules might apply to different subsets of the database. Perhaps this is getting a bit tough!
c.rusbridge 1 year ago
I should have added some more explicit ideas, from the RRS blog posts, the idea of data management (as well as a data repository):

It is essential that the Data Management elements support current, dynamic data, not just static data. You may need to capture data from instruments, process it through workflow pipelines, or simply sit and edit objects, eg correcting database entries. Data Management also needs to support the opposite: persistent data that you want to keep un-changed (or perhaps append other data to while keeping the first elements un-changed).

One important element could be the ability to check-point dynamic, changing or appending objects at various points in time (eg corresponding to an article). In support of an article, you might have a particular subset available as supplementary data, and other smaller subsets to link to graphs and tables. These checkpoints might be permanent (maybe not always), and would require careful disclosure control (for example, unknown reviewers might need access to check your results, prior to publication).

Some parts of Data Management might support laboratory notebook capabilities, keeping records with time-stamps on what you are doing, and automatically providing contextual metadata for some of the captured datasets. Some of these elements might also provide some Health and Safety support (who was doing what, where, when, with whom and for how long).
c.rusbridge 1 year ago
Despite having voted against this, perhaps worth adding:

Some spinoffs you should get from your RRS would include persistent elements for your personal, department, group or project web pages (even the pages themselves). It should provide support for your CV, eg elements of your bibliography, project history, etc. It will provide you and your group with persistent end-points to link to.
c.rusbridge 1 year ago
I think (but am not certain) that a CRIS is more about linking services than an open source platform. There is EuroCRIS (www.eurocris.org), an association that is interested in:

- "Research databases, thematic and according to type of information (expertise, projects, institutions, facilities and products - including publications)
- CRIS related data: scientific datasets, (Open Access) institutional repositories, knowledge-assisted data collection systems and process-based workflow systems
- Maintenance and dissemination of CERIF (Common European Research Information Format) to accommodate implementation and interoperability of CRIS (related) systems
- Data access and exchange mechanisms, standards and guidelines and best practice for CRIS"

Notice CERIF as an interoperability protocol of some kind. I'm not sure if this is useful, but someone has been thinking hard in this area, and maybe we should assess what they are doing!
c.rusbridge 1 year ago
Owen, I like that. And those "places" are re-inforced by a bunch of institutional mechanisms, many of which are not helpful in supporting the users with their tasks.
c.rusbridge 1 year ago
The value-added service that I can believe in that requires consistency, is the creation of "virtual subject repositories" by linking across actual institutional repositories. In fact, the single major plus-point of OAI-PMH that I can see (since no-one actually uses the service providers that it can help create) would be to allow these virtual repositories. But the inconsistency of partitioning or subject classification etc makes this difficult or impossible...
c.rusbridge 1 year ago
My concern is that this focuses on the benefits of the collection; I think the benefits have to come before the objects get into the collection!