Open Discovery

I was lucky enough to attend a session at ALA on the growing field of Discovery Systems. A group of people representing content providers, discovery system vendors, libraries, and standards bodies came together for a preliminary discussion toward an Open Discovery Initiative. As this was a very preliminary discussion, nothing more tangible came out of this meeting than the need for more discussions, and the sense that people think this initiative is necessary and timely. However, this work could potentially have a significant impact, and I’d like to write a little about my thoughts here.

It can be difficult to be come with up a cohesive summary of a loose, early-days discussion, so I give the caveat that I could be a bit off base in my impressions of what this initiative is about. All I know is what I took from the discussion, and what I personally feel should be our prerogatives in this area. If what I write here is far from the reality of what the other people in the room had in mind, the fault is entirely mine: As a relative newbie in a room full of libraryland veterans, there is a certain level of vocabulary and tone that I am as yet unfamiliar with, and so it’s possible that in my attempt to get a big picture sense of where this initiative is going, I am making certain assumptions that don’t stand. That being said, here is my loose sketch of where this initiative may be headed.

The market for what I am going to call web-scale discovery systems has exploded in the last year or two. By web-scale, I refer to systems that attempt to pull together the entire range of materials to which libraries can provide access, including print books, ebooks, full-text online journals, abstracting and indexing databases, digital archives, Open Access journals on the web, and so on (although at this point these systems generally include only our local catalogs as well as the databases and full-text journals to which we subscribe). Content is included in these systems largely through deals that are made between the system vendor and various content providers. The Open Discovery Initiative is about finding some kind of standard that will allow data to be more easily added to a discovery system (at least, I think that’s what it’s about). Essentially, this would be a technical standard that would allow discovery systems themselves to more easily discover resources. It would be easier for publishers to add their content, easier for libraries to add unique content we hold, and easier for content on the web to be added to an index.

At this point the ODI isn’t about user interfaces or audience needs, but about the technical back-end that would allow data to be more discoverable, and about transparency in the creation of these discovery systems so that libraries know what, specifically, is being discovered, how it’s being indexed, and how it’s being presented to users.

This initiative is trying to pull together all of the relevant stakeholders, including content providers, system vendors, and libraries. I think it’s very important that all stakeholders are involved, but I also think this could create some friction, because we all have different interests: Vendors want to protect their trade secrets, so transparency around how things are indexed, how relevancy ranking works, and how metadata is used isn’t really in their best interest. Content providers, especially abstracting and indexing services, want to protect their livelihoods, and releasing metadata to systems in an open way, and allowing it to be searched without authentication, certainly isn’t in their best interest. And libraries, well, we want to provide wide access at little costs, and that is pretty well opposed to the interests of our partners in this endeavor.

In some respects I see this initiative as being in line with re-thinking the ILS, and re-thinking bibliographic metadata. I think that we have to partner with publishers and vendors, but I worry that we will give away too much to them, without being insistent on our own needs. I wonder if it doesn’t make sense for libraries to first start thinking about what we need from the information landscape, and then start working with vendors to make it happen. This largely stems from the fact that I’m a big picture thinker who likes to have all her ducks in a row before I jump in and get my hands dirty, and I am definitely not going to insist that that is the best approach; in fact, it probably isn’t. But I am wary because I think library systems have been shaped by vendor needs, rather than our own needs, for a long time.

At the very least, this will be an interesting thing to keep your eyes out for, and I think libraries need to be very involved, so decisions aren’t being made without us. It would be great if we could come out of this with some consensus around how publisher metadata and library metadata (which is really, I think, what we’re talking about here) can work together more effectively. It would be even better if libraries come out of this with some sense that there are systems out there that really and truly let us start breaking down the information silos that make up so much of our technical infrastructure.

I’m hoping I didn’t get any of this wrong; it’s easy to misunderstand conversations that are, by their very nature, kind of nebulous. It was certainly interesting, and I think fruitful, to be in a room with people who have different stakes in the outcome, and to be reminded that libraries’ needs aren’t the only needs that matter (although I sometimes really wish they were). I’ll look forward to learning more about this as the Open Discovery Initiative takes more solid shape.