Back in June I attend the 3rd SITS (Scholarly Infrastructure Technical Summit) meeting, held in conjunction with the OAI7 workshop and sponsored by JISC and the Digital Library Federation. This meeting, held in lovely Geneva, Switzerland, brought together library technologists and technology leaders from North America, Europe, Australia, and Asia for the purpose of exploring common technology and technology-related issues that crossed our geographic boundaries.
This is the first SITS meeting that I attended – prior to this meeting, there were two other SITS meetings (one in London and one in California). As this SITS meeting was attached to the OAI7 conference, it brought together a group of stakeholders who’s roles in their organizations spanned from technology implementors to technology strategists and decision makers. From having chatted with some of the folks who had attended previous SITS meetings, the attendees at those meetings tended to weigh heavily on the technology implementer / developer side, while this particular instance of SITS had a broader range of discussion that, while centered on technology, also incorporated much of the context to which technology was being applied. For me, that actually made this a more intriguing and productive discussion, as I think that while there are certainly a great variety of strictly technical issues with which we grapple, what often gets lost when talking semantic web, linked data, digital preservation, etc. is the context and focus of the purpose of deploying said technology. So, with that particular piece of context, I’ll describe some of the conversation that occurred at this particular SITS event.
Due to the schedule of OAI7, this SITS meeting was held in two parts – the afternoon of 24 June, and the morning of 25 June. For the first session, the group met in one of the lecture rooms at the conference venue, and this worked out quite nicely. SITS uses an open agenda / open meeting format, which allows the attendees to basically nominate and elect the topics of discussion for the meeting. After initial introductions, we began proposing topics. I tried to capture as best I could all of the topics that were proposed, though I might have missed one or two:
* stable links for linked data vs. stable bitstreams for preservation
* authority hubs / clustered IDs / researcher IDs / ORCID in DSpace
* effective synchronization of digital resources
* consistency and usage of usage data
* digital preservation architecture – integration of tape-based storage and other storage anvironments (external to the library)
* integration between repositories and media delivery (i.e. streaming) – particularly to access control enforcement
* nano publications and object granularity
* pairing storage with different types of applications
* linking research data to scholarly publications to faculty assessment
* well-behaved document
* research impacts and outputs
* linked open data: from vision to deployment
* Relationship between open linked data and open research data
* Name disambiguation
Following process, we took the above brainstormed list and proceeded to vote on which topic to begin discussion. The first topic chosen was researcher identities, which began with discussion around ORCID, a project that currently has reasonable mindshare behind it. While there are a lot of backers of ORCID, it is not clear whether the approach of a singular researcher ID is a feasible approach, though I believe we’ll discover the answer based on the success (or not) of the project. In general, I think that most of the attendees will be paying attention to ORCID, but that also a wait and see approach is likely as there are many, many issues around researcher IDs that still need to be worked through.
The next topic was the assessment of research impacts and outputs. This particular topic was not particularly technically focused, but did bring about some interesting discussion about the impact of assessment activities, both positive and negative.
The next topic, linking research data to scholarly publications to faculty assessment, was a natural progression from the previous topic, and much of the discussion revolved around how to support such relationships. I must admit that while I think this topic is important, I didn’t feel that the discussion really resolved any of the potential issues with supporting researchers in linking data to publications (and then capturing this data for assessment purposes). What is clear is that the concept of publishing data, especially open data, is one that is not necessarily as straight-forward as one would hope when you get into the details, such as where to publish data, how to credit such publication, how is the data maintained, etc. There is a lot of work to be done here.
Next to be discussed was the preservation of data and software. It was brought up that the sustainability and preservation of data, especially open data, was somewhat analogous to the sustainability and preservation of software, in that both required a certain number of active tasks in order to ensure that both data and software were continually usable. It is also clear that much data requires the proper software in order to be usable, and therefore the issues of software and data sustainability and preservation are in my senses interwoven.
The group then moved to a brief discussion of the harvesting and use of usage data. Efforts such as COUNTER and popirus2 were mentioned. The ability to track data in a way that balances anonymity and privacy vs. added value back to the user was discussed – the fact that usage data can be leveraged to provide better services back to users was a key consideration.
The next discussion topic was influenced by the OAI7 workshop. The issue of the synchronisation of resources was discussed, and during OAI7, there was a breakout session that looked at the future of OAI-PMH, both in terms of 1.x sustainability as well as work that might end up with the result of OAI-PMH 2.0. Interestingly, there was some discussion of even the need for data synchronization with the advent of linked data; I can see why this would come up, but I personally believe that linked data isn’t at the point where other methods for ensuring synchronized data aren’t necessary (nor may it ever be).
Speaking of linked data, the concept arose in many of the SITS discussions, though the group did not officially address it until late in the agenda. I must admit that I’ve yet to drink the linked data lemonade, in the sense that I really don’t see it being the silver bullet that many of its proponents make it out to be, but I do see it as one approach for enabling extended use of data and resources. In the discussion, one of the challenges of the linked data approach that was discussed was the need to map between ontologies.
At this point, it was getting a bit late into the meeting, but we did talk about two more topics: One was very pragmatic, while the other was a bit more future-thinking (though there might be some disagreement on that). The first was a discussion about how organizationally digital preservation architectures were being supported – were they being supported by central IT, by the Library IT, or otherwise? It seemed that (not surprisingly) a lot depended upon the specific organization, and that perhaps more coordination could be undertaken through efforts such as PASIG. The second discussion was on the topic of “nano-publications”, which the group defined as “things that simply tell you what is being asserted (e.g. Europe is a continent)”. I must admit I got a bit lost about the importance and purpose of nano-publications, but again, it was close to the end of the meeting.
BTW, as I’m finishing this an email just came through with the official notes from the SITS meeting, which can be accessed at http://eprints.ecs.soton.ac.uk/22546/