|
|
The purpose of this site is to gather ideas and opinions about how repositories are defined and consistency between them. The discussion will be used to prepare the outputs from the Repository Architecture meeting on the 3rd of July and the Repositories Road Map meeting on the 21st of July and a report on consistency.
Please comment or vote on ideas or post your own for others to comment and vote on.
|
|
6, Define repository as part of the user’s (author/researcher/learner) workflow
It is important to take account of user's workflows when defining a repository so it is not considered a system that is removed from the users daily routine.
Tags : workflow users
|
|
|
Repositories are dead, long live repositories
The current repository technology is library/cataloger centric: items are uploaded (usually by a cataloger, not the author), and most of the meta-data is added by a subject specialist. In this model, the author-as-depositor is (at best) just an initiator for a deposit process.
A better solution would be to move towards a Combined Research Information System [CRIS], where the academic can organise their areas of interest [AOI]; see the research grants they have (and associate them with their AOI); lodge keep-safe copies of work-in-progress, data-sets, talks, ideas for future work, posters, etc (and associate them with grants or AOIs).
From this corpus of data, the academic can indicate what is visible locally (within the research group/department/organisation) and what is available globablly... and from that "globally available" pool, an "Institutional Repository" can be assembled.
The big advantages of a system like this is that the user only needs to define the meta-data specific to that object (an AOI has a title and a description, and inherits a creator from the CRIS; an article has a title and an abstract, but also inherits data from the associated grant and/or AOIs) - this is a much smaller "keystroke" barrier (or whatever you call that "I don't want to enter lots of metadata" problem)
|
|
|
5, Focus on services to users enabled by digital collections in repositories
(i.e. emphasise benefits)
Tags : users benefits services
|
|
|
Consistency between repositories is not an end in itself; it is only important if it is a requirement of real value-added servic
Interoperability needs to be motivated by service requirements, not fetishized as an end it itself.
|
|
|
Allow the user fine-grained disclosure/access control to repository objects
If the repository is to become anything other than a final destination for public objects, then the user needs control over access. This control must be able to ALLOW access to the objects by colleagues, wherever they work, as well as prevent access by others.
|
|
|
Say what we mean: stop using the term repository
When we use the term repository in the context of JISC(and other repository networks) essentially it means making content (in our case produced as part of research, learning and teaching) available over the network so it can be shared and used. But the word doesn’t say that. The word says store. We should be saying what we mean. We should really be talking about making content available on the web? And if concerned with preserving content talk about doing that etc. The term repository has almost become meaningless because so many uses and functions are bundled up together under that term.
|
|
|
1, Definition assumptions
Definition should not make assumptions as to implementation architecture i.e. whether deposited collection(s) held at institutional or network level
Tags : repository architecture institutional network level
|
|
|
7. We shouldn't be thinking of repositories as a place.
With acknowledgement for this idea to Owen Stephens' recent Tweet. My interpretation of this idea is that 'repositories' are best viewed as a 'type' of data store supporting a variety of services, embedded in various workflows. This fits nicely with Paul Walk's concept of a 'source repository' (see http://tiny.cc/FIHwc) being a simple system with complexity moved to specialised services. I suppose this approach isn't that far removed from the original OAI concepts of data provider and service provider, though the focus there was on access whereas now we are considering a wider context for repositories..
Tags : place source
|
|
|
There are feasible and worthwhile approaches which will improve the consistency with which repositories share metadata
As part of our work to "examine the feasibility of approaches to improve the consistency with which repositories share material", we are looking at this in regard to 3 areas: metadata (this idea), the materials themselves and descriptions of repository policies (e.g. on IPR) [materials and policies appear as separate ideas].
Tags : repository jisc
|
|
|
The repository should be more like "part of the web"
This is the Andy Powell worry; we have made the repository too much of a "special thing" operating under "library rules". Make it more like Slideshare. I'm going to express this another way...
|
|
|
The repository/library should provide support in the publishing process
Another from the Research repository System (RRS) blog posts:
Publisher liaison is maybe controversial. But why shouldn’t the RRS staff (or your library) support you in dealing with publishers? The RRS wants your articles and your data, and should help you negotiate and reserve the rights so that they can get them. So publisher liaison would include rights negotiation, submission to the publisher on your behalf of a specific version, support through the editorial revision process, and recovery of metadata from the published version for the RRS records and your own bibliography, web page and CV. Naturally, deposit in the repository would be integrated in this workflow; you only have to authorise opening to the public, or perhaps a more restricted audience.
|
|
|
Make the repository work for the user, not the other way round
I guess this is the workflow idea again, but stated another way. Don't get too hung up on "workflows", as in the e-science meaning (kepler, taverna et al). This is about making the repository fit in what people are trying to do, eg write the article, keep multiple versions, share with their colleagues in other institutions...
|
|
|
2, Different definitions are required for different audiences
Repository does not mean much to a researcher but it has a very specific meaning to a librarian. Therefore we need to make sure that there are definitions that can be tailored to specific audiences to ensure that messages are understood.
Tags : audiences communication
|
|
|
Help the user manage data
Managing data can be a big problem. Any data that might, for example, become supplementary data in an article, needs curating. Help the user by providing facilities to capture and hold intermediate versions of the data, ad the final public version.
|
|
|
Repository is associated with a persistent storage system
OK, I'll go the whole hog in relation to the RRS blog posts:
At a very basic level, the RRS should [be associated with] a Persistent Storage service. Completely agnostic as to objects, Persistent Storage would provide a personal, or group-oriented (ie within the institution) or project-oriented (ie beyond the institution) storage service that is properly backed up. There’s no claim that Persistent Storage would last for ever, but it must last beyond the next power spike, virus infection or laptop loss! It has to be easy to use, as simple as mounting a virtual drive (but has to work equally easily for researchers using all 3 common OS environments). Conversely (and this isn’t easy), there must be reliable ways of taking parts of it with you when away from base, so synchronisation with laptops or remote computers is essential. It should support anything: data, documents, ancillary objects, databases, whatever you need. It’s possible that “cloud computing” eg Amazon S3, the Carmen Cloud or other GRID services might be appropriate.
|
|
|
The repository should be meshed into a more sophisticated system of researcher identity management
Again from the RRS blog posts:
We don't think about identity management as part of the repository, although a really annoying early experience of DSpace related to the requirement for a completely separate identity. This seems to have been overcome by getting the librarian to do mediated deposit for you, but I don't have the feeling that the repository is well integrated into the institutional identity system. It should be, but I want more!
I may see the RRS as a special case of an Institutional Repository (IR), but many if not most research collaborations are cross-institutional. This means that if there is to be support for cross-institutional authoring, there has to be support for members of other institutions to log in to your RRS. And this has to be seamless and easy, ie done without having to acquire new identities.
In addition, Researcher Identity should provide name control, that is, it knows who you are and will fill in a standardised version of your name in appropriate places. It should know your affiliation (institution, department/school, group, project and/or possibly work package). It might know some default tags for your work (eg Chris is normally talking about "digital curation"). However, this naming support must extend beyond your institution, so that collaborators and co-authors can be first-class users of other features. And it should relate to your (and their) standard institutional username and credentials; nothing extra to remember. This implies (I think) something like Shibboleth support.
This is getting kind of complicated, and verging towards another complex realm of Current Research[er] Information Systems (CRIS, mentioned in other ideas). These worthy systems also aim to make your life easier by knowing all about you, and linking your identity and work together. But they are complex, have their own major projects and standards, and have been going for years without much impact that I can see, except in a few cases. The RRS should take account of EuroCRIS and CERIF (see Wikipedia page) as far as they might apply.
|
|
|
Recognise the differences in services for preservation and services for sharing
The umbrella term "repository" conflates two very different kinds of services - services whose primary purpose is to preserve a type of media, and services whose primary purpose is to enable media to be shared and used by people. They don't look the same, they have different kinds of users and roles, they don't share the same concerns, and you use different language to talk about their features. Maybe we would get further by having an amicable divorce, and only get together to talk about things that are completely generic, like storage.
|
|
|
Repository should aspire to make contents accessible and usable over the medium term
A repository should be for content which is required and expected to be useful over a significant period. It may host more transient content, but by and large the point of a repository is persistence. While suggesting a repository should be a "full OAIS" has not proved acceptable to this group so far, investment in a repository and this need for persistence suggest that repository managers should aim to make their content both accessible and usable over the medium (rather than short) term. For the purposes of this exercise, let's suggest factors of around 3: short term 3 years, medium term around 10 years, long term around 30 years plus. Ten years is a reasonable period to aspire to; it justifies investment, but is unlikely to cover too many major content migrations.
To achieve this, I think repository management should assess their repository and its policies. Using OAIS at a high level as a yard stick would be appropriate. Full compliance would not be required, but thought to each major concept and element would be good practice.
This "idea" is to replace the "full OAIS" approach with something more realistic and achievable.
|
|
|
The repository should provide authoring support
This is a refinement of the current top-rated idea, based on one of my blog posts on research repository systems.
Authoring support should include version control, collaboration, possibly publisher liaison, and be integrated with the repository deposit process. It does need object disclosure control, see below. Version control would support ideas, working drafts, pre-prints, working papers, submitted drafts undergoing editorial changes, and refereed and published versions. Collaboration support would need to include support for multiple authors contributing document parts, and assembly of these into larger parts and eventually “complete” drafts. It should also include some kind of multiple author checkout system for updates, something like CVS or SVN, maybe a bit WIKI-like. It must support a wide choice of document editor, eg Word, OpenOffice.org, LaTeX etc (I don’t know how to combine this with the previous requirement!).
|
|
|
There are feasible and worthwhile approaches to improve the consistency with which repositories share the materials they hold
Part of our work to examine the feasibility of approaches to improve the consistency with which repositories share the materials they hold (this idea), the metadata and descriptions of repository policies
Tags : repository jisc
|
|
|
The repository should have more "web 2.0" features
Again, the Andy Powell idea. This one, I think, more about sharing, embedding, mashups. Think Flickr. Think sneep.
|
|
|
Institutional research repositories are based on different models - not only solely a 'digital object' repository
Most early Institutional Repositories were research repositories. Some are purely repositories housing digital objects as in "Repositories are "collections of digital objects"". However, since one of the primary aims is to showcase the intellectual assets of the institutions (as compared to providing Open Access to peer reviewed journal articles) another model was 'hybrid'. The use as a bibliography (suggested both by previous practice and by senior academics) required the metadata to be deposited even if it was not possible to deposit the 'publication'. This is particularly important if you want to showcase well the whole institution, including the Humanities, where outputs are not so easily deposited eg a book or exhibition. Therefore one model is 'hybrid' including both digital objects and their metadata and sometimes just metadata or metadata plus links to trusted repositories elsewhere. This latter aspect may become more important as the number of these trusted (eg funder) repositories grow. Of course, you can also make a subset of this repository which includes 'full text only' as in the alternative " digital object repository" model but this does not then give a full picture of the institution. Hey, Jessie M.N., Simpson, Pauline and Carr, Leslie A. (2005) The TARDis Route Map to Open Access: developing an Institutional Repository Model. In, Dobreva, Milena and Engelen, Jan (eds.) ELPUB2005 From Author to Reader: Challenges for the Digital Content Chain: Proceedings of the 9th ICCC International Conference on Electronic Publishing, Katholieke Universiteit Leuven, Leuven-Heverlee, Belgium, 8-10 June 2005. Leuven, Belgium, Peeters Publishing, 179-182. http://eprints.soton.ac.uk/16262/
Tags : institutional research repository hybrid model
|
|
|
Broad principles not tight prescriptions
The changes in technology, the diversity of cataloguing practice, the diversity of ownership and legal considerations and the possibilities for metadata to be created remotely all mean that acceptable and achievable recommendations for consistency between repositories are likely to be broad principles with examples of good practice rather than prescriptive rules or precise recommendations.
Tags : recommendations schema policies metadata
|
|
|
Service creators will be looking for human-readable specifications
People who might create services from repository-based information will be looking for simple human-readable information on the policies, formats and metadata used by repositories. This is as important as creating machine-readable interfaces.
Tags : human readable machine readable repository services
|
|
|
Lets think outside the box....
This is focused on the researcher world, but the arguments hold for other fields
Q: What is the primary factor for ranking researchers? A: Citations. Surely the aim, therefor, of the researcher is to market her work as widely as possible, to maximise the potential for citation. Given that we are now in the Information Age, where The Internet is the primary source of answers (backed up by reading what has been found, on paper), then the sensible solution is to place enough of the research results on the Internet such that they can be found and assesed, and followed up. Where, in the Internet, this material is placed is almost moot: the Internet has no location per sae - Search Engine index everything, everywhere.
Q: What is the primary factor for ranking Institutions? A: The amount of research performed by researchers of standing (see above) Surely the aim, therefor, of the Institution is to market the work of their researchers, with sufficient "corporate identity" attached, as widely as possible, to maximise the readership of that work.
THEREFORE I think we can say that researchers need publicity, and Institutions want to be the ones to do it.
The question I see is: "How can we make it easist for the researcher to publicise their work, and how can we help the Institution capitalise on that individual publicity?"
"Institutional Repositories" are the current solution - are they the right one?
|
|
|
The Repository is about re-engineering institutional business processes
The concept of the repository is difficult to distinguish from other kinds of institutional services which might be offered (digital archiving for example), unless the original context of the idea is considered, which was (and is) the scholarly communications crisis.
Within the context of the scholarly communications crisis, the original purpose of repositories was to provide a way in which universities could enable access to research which they could no longer afford to buy because of the rising costs of journals. The repository was therefore a tool to aid the re-engineering of the business process of the library (as we came to think), or even the business processes of the entire university (as we now tend to think).
My own view is that if the repository isn't serving that function, it isn't a very useful concept.
Tags : repository definition re-engineering business process scholarly communication crisis
|
|
|
Metadata will increasingly be created remotely at the point of need
Far from becoming irrelevant, metadata for repository items will become more important but it will increasingly be created and assigned remotely. This will be by automated procedures such as indexing and text analysis and also by users and readers, through the use of tagging mechanisms. These developments will have implications for consistency between repositories and between items.
Tags : metadata tagging auto classification indexing
|
|
|
4, Definition should encompass likely evolution in software solutions
Examples include content management systems, virtual research environments, CRIS etc
Tags : software future
|
|
|
There are feasible and worthwhile approaches to improve the consistency with which repositories share their policies
Part of our work to examine the feasibility of approaches to improve the consistency with which repositories share descriptions of repository policies (e.g. on IPR) - this idea -, metadata and the materials themselves.
Tags : repository jisc
|
|
|
Form follows function in defining "global knowledge waiting rooms"
Humans have never before been called on to save so much stuff in whatever we name these digital containers. Historically we have been compelled by circumstance to let things go, albeit often unwillingly. The list of what we leave behind can include almost everything we care about--books, photographs, hard drives, memorabilia and artworks--to even bigger items such as houses and cars. Whether selling, donating, recycling, sharing or being forced to abandon our stuff due to unforeseen circumstances, we are more or less wired to cope with the dynamic lifecycle of digital and analog “knowledge objects” as they intersect with our lives.
As I hold out hope that I will find time to organize, annotate and share my digital photo archive, others look to keeping ideas--digital text and rich media--around long enough for academic and public review that holds the promise of transformation into vetted knowledge. The timetable for when a paper, dataset or video will become useful, or perhaps even critical, is often unknown. As the cost of storing digital stuff has gone down, we seem less willing then ever before to let things go. The conundrum of coping with the biggest information deluge in human history, coupled with cheap storage, and an unknown timetable for usage seems to equal a disruption in our collective ability to merge and purge our stuff. Terminology discussions for what to call (packed) global knowledge waiting rooms seem to be to be a by-product. We can now afford to rent an endless number of mini-storage units, but will never have time to arrange or make use of their contents.
Les Carr pointed out during OR08 in Southampton earlier this year that collecting and curating over time is what a persistent and permanent repository backed by policies and institutional commitment implies. A repository is not intended to be a fly-by-night dumping ground. About ten years ago the terms "digital library" seemed to be a way to give small or large, and sometimes poorly organized, collections of academically-created web pages a certain gravitas that would promote preservation. A "portal" to resources is also a term that has been used to imply "more than a mere web site." Terminology that is meant to denote REALLY IMPORTANT STUFF has been around for a while. What has been missing in finding the right name is a view towards specific functionality that might contribute to a knowledge workflow on top of resources to make use of really important stuff.
In his keynote address at JCDL 2008 Alex Szalay explained that there is a science project pyramid that builds on a single lab at the base, a multi-campus project in the center, and international consortia on top as scientific disciplines recognize the need for major initiatives that are highly collaborative and distributed. He suggested that the output from these efforts at every scale contain:
–Literature
–Derived and re-combined data
–Raw data
Szalay would like to see a continuous feedback loop among these three aspects where data and analysis are always updating. In my view the active form that Szalay outlined should be encompassed by a term that implies the inherent function of a semantically-enabled analysis loop in a dynamic "knowledge waiting room."
|
|
|
Minimal metadata for sharing?
For all practical purposes, the ability to express metadata as the Dublin Core metadata elements is a sufficient baseline for sharing repository items across subject and institutional domains.
Tags : metadata dublin core sharing
|
|
|
nouns are for numpties
Should we regard Repository as yet another noun of uncertain parentage that is searching for meaning. R is for Repository in the same way that P was for Portal, O was for Ontology and M was for Metadata. N? Nothing just yet, or maybe N is for Network. Fortunately, the other nouns have had some prior usage. Repository has less common usage except as a place to store furniture.
Personally I've always preferred verbs, as these almost automatically put the focus on actions (or states of being), on tasks of actors, as essential links in the subject/verb/object triple.
It is not self-evident to me whether Repository is a new label to describe something(s) that have existed for a while or something(s) that have come into existence to justify the term. In the early 1980s I worked with the Scottish Education Data Archive which was a collection of survey datasets (on related topics, and generated by a research centre in a university over time), then in the mid to late 1980s and 1990s I worked with Edinburgh University Data Library which was both a collection of user-contributed datasets and a collection of third-party published datasets. In both instances the purpose of the 'data archive' and 'data library' was to provide access to those datasets to them that wants them. For both, there was some forward thinking, in that we collected and curated ahead of demand: for example we took in user-contributed digitised boundaries of a particular geography before we knew anyone would re-use them. In the late 1990s and since I have worked on a variety of online services which depend upon the management of databases of data objects, datasets and datastreams that others (not me) have created, although these are mostly not 'user- or community-generated'. We did not call them repositories at the time - or not until quite recently.
Jorum is a national repository of learning materials, devised and developed by staff at EDINA and Mimas in response to expressed requirements to keep stuff safe, and to enable and facilitate sharing. What makes it a repository? Its a database that we call a repository. Why? Because that was the term of the moment and was and is understood within a certain 'designated community', but not much beyond. When thinking about Jorum, about the repository built for GRADE, and for the store of datasets used for eMapScholar, and then for the Depot (in the Prospero project) we thought that Cliff Lynch's statement that "a university-based institutional repository is a set of services" needed re-phrasing: a repository is a managed database that supported three (or more) services, necessarily including deposit (ingest), keep-safe, access (download). But any decently managed database does that surely?
M2M access, by API (and OAI-PMH) has been put up as a necessary characteristic of a repository, but that m2m access has been commonplace for many services from EDINA and Mimas, and again is that not just what we would want from any managed network-accessible database?
Digimap is built upon a range of databases, some populated by data from the Ordnance Survey, some by derived data (value added, curated by EDINA) and now also some contributed by users.
I confess I am at a loss to understand what is distinctive about a repository. Except perhaps, that the attention should focus on the quality and nature of the service that is delivered to the (potential) depositor. Understanding why someone wants to deposit (share) something, and what would constitute reward (in terms of happiness not just lack of pain) for the act of depositing is hard, elusive and novel. We are examining how to make the Depot into a service for happy putting, so too with Jorum. The motives for sharing differ, as does the nature of the workflow during which 'deposit' could be considered natural.
Now B is for Bucket: must it hold objects as well as liquid, must it provide means by which things can be poured into it, as well as out? Is there a hole in the bucket, does it have to have a handle, what if there was a spout?
|
|
|
A series of 20 key interviews to assess the feasibility of approaches to improve consistency
In particular we want to ask these key interviewees the questions to which you would like to hear the answers. So if you have an interesting or useful question (or more than one), particularly concerning the creation of user-facing services using repository content, then please use the comment facility to suggest it. Or even better put it here as a new idea (go to the home page and choose New Idea, then choose category "consistency") and then others can comment on it.
And we need to know who to ask. Who will be the most useful people to get these answers from - whose views would you be intrigued to hear on this topic? Again use the comments facility, or better still put a name here as a new idea (go to the home page and choose New Idea, then choose category "consistency") and then others can comment on it.
Tags : repository jisc
|
|
|
We should embrace inconsistency
We cannot achieve consistency, so if it is important then we are doomed to failure. Why can't we achieve consistency?
There are (say) 200 universities in the UK, and perhaps 20,000 worldwide, then there are subject repositories, project repositories, library and archive repositories and commercial repositories (which may be free or charged for or a mixture).
There are data repositories, image repositories, paper repositories etc.
All these repositories are set up for particular reasons and will want to achieve different things. What the BBC wants people to do with their's is very different to say NICE or the University of Wigan. They will, inevitably, have different collection policies, different ideas on appropriate metadata standards, different methods of accessing them (an image repository or data repository will require different affordances to a text repository).
To expect any form of consistency - of language, of policy, of metadata, of standards even of legal scope will simply not work.
Indeed, I would suggest that to achieve consistency we would require working in a closed community, and even then it would probably not work.
The alternative is to embrace inconsistency and work with that.
Tags : consistency repository
|
|
|
The repository should be a full OAIS preservation system
We should at least have this on the table. I think repositories are good for preservation, but the question here is whether they should go much further than they currently do in attempting to invest now to combat the effects of later technology and designated community knowledge base change...
|
|