eLINKy na eDOKUMENTY - "to pravé jahodové"
Sjoerd Vogt, The Dialog Corporation, Witney, Velká Británie
Information available over the web is increasing exponentially, but searchers continue to recognise the enormous value of secondary information as a stepping stone to the primary. The traditional abstract & index databases are able to find the proverbial needle in a haystack where web search engines fail abysmally. However - users increasingly expect to be able to link directly from the secondary information source to the primary full-text, and many publishers and aggregators already have such e-Linking technologies in place. The Dialog Corporation has also recently launched an e-linking service for Datastar and for the OnDisc range of products – and will be adding e-linking to Dialog later this year. What was Dialog able to learn from those organisations who paved the way – and what were the choices that had to be made?

INTRODUCTION

The glut of free information on the web is blamed for many things - including some things that have absolutely nothing to do with it. We’ve seen over the past decade first a perceived devaluing of secondary information; the theory being that the primary info will be made available free directly to the users - under alternative publishing models perhaps. But more recently we’ve begun to see a counter-trend.  Users increasingly want single points of entry for their information gathering process; points of entry that offer them guaranteed reliability, content, quality, and above all  - comprehensiveness. This is leading to a re-evaluation of the traditional secondary databases - and an increased appreciation of the important role that they play in finding that proverbial needle in a haystack.

BUT - such secondary databases only have a valid role in the information gathering process IF they then lead to the primary information. Traditionally, this linking through to the primary took the form of hardcopy document ordering and delivery - often involving dedicated staff who were (and still are!) document delivery specialists.

However, user expectations are changing very rapidly in this respect. Internet-literate users now expect to be able to hyper-link directly through to the primary information referred to - and nothing less than the article level will do.

For this reason, many of the online hosts, aggregators and publishers have all been working hard to give their users reliable e-Linking.

Dialog is no exception - and over the past year has been actively working on different e-Linking services. The Dialog Corporation has recently launched an e-linking service both for Datastar and also for the OnDisc range of products - and will be adding e-linking to Dialog later this year.

We aren’t the first - but it’s better to be “leading edge” rather than “bleeding edge” - as they say. What was Dialog able to learn from those organisations who paved the way - and what were the choices that we had to make?
 

GOLD EDOCS

The issues and decisions for OnDisc e-Linking and Datastar e-Linking were very similar - if not identical. Gold eDOCs adopts a “just in time” policy - only generating the necessary URLs for e-linking on request - whereas Datastar e-links operates a “just-in-case” policy - generating URLs for ALL the records in the relevant set of secondary information records. However - for the purposes of this paper, we will concentrate on “Gold eDOCs” - the e-Linking service that is the link between your OnDisc bibliographic database records and the electronic fulltext. Each record retrieved in a search will have its own “eDocs” button. When you point and click, a real-time generated URL will then take you directly to the electronic fulltext.

One of the very flexible features of Dialog’s approach is the use of a “Remote Links Server” (RLS) . The URL that links to the fulltext at the article level is in fact generated by a central links Server maintained by Dialog (as opposed to having multiple static copies at customer sites). This same RLS is used by all the different Dialog services. There are therefore no static locally stored "links" databases at the customer sites that can go out-of-date as soon as they are published. It is worth noting that where aggregators have taken the approach of having the databases of links at the customer sites, the locally stored databases can be extremely big. As the number of e-journals available continues to increase, this problem will only be exacerbated.

Also very important is the ability to configure Gold eDOCs so that your users will only be presented with “live links” - those that you subscribe to and therefore have access to. It would be very annoying and frustrating for your users if they were to follow a link through to an e-Document - only to find that they did not have access to it. This will not happen with Gold eDocs. Gold eDocs will only generate URLs for your titles, via your chosen sources. This will work equally well on agent-hosted or locally hosted databases.  This is often called “the Harvard problem” - and we have a solution for this with Gold eDOCs.

We’ve made the decision that Gold eDOCs should be free of extra charge. The user has already taken out a subscription to the secondary database and to the primary journal. The link between them is software functionality that is therefore part of their subscription.
 
 

Gold eDOCs Architecture

By using an open architecture that is centred around Dialog’s Remote Links Server, the user has maximum flexibility in implementing and integrating Gold eDOCs into their own network. Once the URL has been generated by DIALOG’s RLS, the request is passed through via a second browser window to the fulltext publisher or aggregator chosen. Under non-disclosure, the customer could even use the Gold eDOCs XML request/receive structure to set up their own “Remote Links Server”.
 

Gold eDOCs Partners

Gold eDOCs is already able to link through to most of the world’s leading fulltext aggregators and publishers - giving access to more than 3000 e-Journals. Discussions are underway for further linking agreements which will be announced as they are agreed.  The Gold eDOCs architecture means that new e-Linking partners can be set at the customer site within minutes, once the central rules have been established on the Remote Links Server.
 

LESSONS LEARNT IN DEVELOPING GOLD EDOCS
 

Scalability

If the problem is to set up links from just one secondary database, then this relatively easy - and can be done with hard-wired links from each record - such as happens with the NLM’s “PubMed” .  If e-Linking is just to one publisher, then this is also straightforward. “CompendexWeb from Ei is such an example - in that it links only to the 460 journals on Elsevier’s “Engineering Direct” .

However - the problem that a host such as Dialog faces is of an entirely different magnitude.  E-Linking now needs to work from any relevant database to any publisher or aggregator - and it needs to take into account all the vagaries of  incomplete information and inconsistent formats.

In the Listserv “NEWJOUR” (http://gort.ucsd.edu/newjour/ )  there are 8616 e-Journals listed as of April 10th 2000 - with approximately 150 new e-Journals being added every month. Interestingly, looking at total of all e-Journals,  the addition of new e-Journals has been approximately linear over the past five years. We see a similar pattern if we analyse the e-Journals available from one particular host (OCLC in this case).  This linear increase is in sharp contrast too the usual expectation that everything on the internet grows exponentially.

It is clear that the number of e-Journals is still only a small percentage of the total number of journals. When you consider that a big database such as InSpec could easily cover more than 10,000 serials, then it is likely that only a small proportion of these will produce valid e-links at present. Similarly, if we compare the NEWJOUR number of 8000+ e-Journals with the total number of ISSNs issued to end 1998 (nearly 900,000) then this also shows that the number of serials available over the internet is at present only a single digit percentage. Nonetheless - this tiny proportion does already include most of the top serials subscribed to.  It is an interesting question as to how the remaining 90%+ ejournals will go electronic - and how quickly. The inevitable conclusion is that existing systems using locally held databases of links at customer sites are not sustainable.

Each e-Journal may be available via a number of different routes - including the original publisher (such as Springer Verlag) and the top e-Journals aggregators (such as SwetsNet and Rowecom IQ) .  Dialog’s e-Linkiing therefore neds to be able to link from 400+ databases through to articles hosted by many dozens of different aggregators and publishers.
 

Design Issues

Mutual Customers or Pay-Per-View?  Both is best. However - the linking to mutual customer subscriptions must come first. The first release of Gold eDOCs is able to link to existing customer subscriptions. The authentication and access to the fulltext is not in any way handled by Gold eDOCs.
However - Pay-Per-View e-Linking with credit card billing is actively being developed for all Dialog’s services (OnDisc and Online) - and clearly offers an important route to fulltext journals that are not used so heavily.

Reliability - or Single Click?  It’s superficially tempting to imagine that a single-click link to the fulltext is the ideal. However - primarily because of the vagaries of inconsistent and incomplete information - the ACTUAL ideal is to give the user the most reliable link possible under the circumstances.

Dialog does this by first populating a “Submit” form  - which can then be checked by the user before being submitted to the Remote Links Server for the URL to be generated. The Submit Form will be available to the user even if the information is incomplete . However - a “traffic light” system indicated to the user whether a valid link is likely. Green for Go, Amber for Caution, and Red for No . This system ensures maximum likelihood that the user will be able to link through to the primary article.

Flexibility - or Ease of Use?  Another balancing act. Both are clearly important to the user. The design of Gold eDOCs ensures that there is no compromise on all-important flexibility in trying to make it as easy to use as possible. However, system administrators must of course first set up teir holdings information if “dead links” are to be avoided.

Linking to All Content Types: Linking at present is primarily with STM journals. However - Dialog’s databases cover every sector - and clearly the e-linking technologies implemented by Dialog need to handle every type of information. In addition, Primary information comes in a rainbow of varieties; from traditional reports and articles, through to multi-media presentations and audio-visual materials. The ultimate aim is to provide multi-directional linking from and to any type of information.

Future Proofing: Dialog is actively monitoring the DOI and CrossRef initiatives. Gold eDOCs has been designed to be full compatible with the these initiatives - and others.
 

Publisher Attitudes

Gold eDOCs is able to link through to nearly one dozen aggregators or publishers - with more imminent.  It is interesting to note that not all potential partners are equally enthusiastic!
Reactions are of six different types:
YES Please - we want as many links as possible!
WAIT we need a corporate strategic review of this
WAIT we don’t have out technology ready yet
NO - you can link to our home page but not to our articles (this is NOT what the users want!)
NO Go away! You’re a competitor
ZZZZzzzz if we ignore you, perhaps you will go away….
 
 
 

CONCLUSIONS
 

It’s early days for e-linking. We’re going to be seeing increasing demand for multi-directional links that loop back on themselves and perform creative contortions. Everything has to link to everything. It’s clear that there is a need for a “server” that provides all of these links. The concept is relatively straighforward - but the implementation is not.
Most of the problems stem from unstructured or incomplete data - and solutions such as Gold eDOCs can only be so clever.
It may be time to reconsider some of the sacred cows of the information industry.
It is wishful thinking to imagine that everything will have a DOI/CrossRef  - and this will only happen slowly. But - why not a standard bibliographic reference format?
With the help of appropriate rulebases, this would then allow you generate the relevant URL for the fulltext hosts in question. But - why not a standard URL format?