Inhalt

Metadata
Newsletter	20030807
Sprache	deutsch
Version	1.0
Veröffentlicht von	NEWSLETTER\administrator
Veröffentlichungsdatum	02.05.2007 09:42:29

How digital will libraries ever be?

Gastbeiträge

Musings on the limits of a popular metaphor

Gastbeitrag von Dr.°Stefan°Gradmann ( stefan.gradmann@rrz.uni-hamburg.de ), Universität Hamburg¹. Dr. Gradmann ist einer der Organisatoren des internationalen OpenSafe-Konsortiums, an dem auch PROJECT CONSULT beteiligt ist.


	1.	Introduction

During the past decades we have experienced the growth of library automation and thus a kind of ‚electrification’ of librarian services. During the same period, internet based and later WWW based information services have emerged, quickly growing into a parallel universe of electronic information, and for quite some time both paradigms of information organization have developed independently. The moment they effectively came into contact the ‘Digital Library’ metaphor was coined, which was useful for reconciling both paradigms at least rhetorically and for a transitional period.

Today, as we seem to approach the end of this period, it may be useful to reconsider the Digital Library metaphor and to consider terminological and conceptual alternatives. The point of departure here is that the term ‘Digital Library’ hides away a number of trivial, yet vital differences, such as for instance the difference between ‘descriptive’ librarian metadata and ‘identifying’ WWW metadata or the different nature of the information objects they point to (‘books’ vs. electronic documents) or again the way they refer to these via shelfmarks and URLs respectively: a look at such differences may therefore be helpful to start with. Only afterwards a look at scenarios integrating both paradigms can again usefully be made to discover the integrating potential of concepts inherent to the Functional Requirements for Bibliographic Records (FRBR) and of ‘Semantic Web’ technology.


	1.1	What this paper is not about ...

Over the past 30 years – and thus along with the advent of library automation – numerous speculations have been published, most of them concerned with either the imminent death of libraries that were seemingly doomed to be replaced by some omnipotent electronic successor or with “business as usual”-proclamations basically stating that libraries – even if ‘electrified’ to the extreme – would ultimately continue to function the way they did for centuries.

Over the last decade – along with the short history of the internet – such speculations were heavily intensified and increasingly focussed on aspects of information technology and information economy as prominently present in the information and communication models of the World Wide Web. These speculations have led to sometimes astonishing and radical conclusions and assertions: WWW-based information services such as Google or Yahoo! were supposed to take over library functions altogether or librarians were expected to catalogue all quality information in the internet, to give just two examples.

None of these radical changes has actually happened – and still a lot has changed. And the speculative strive to make projections and predictions in this field certainly has been fed by the common feeling that something fundamental is happening to our ideas and techniques of dealing with information and to our concepts of information themselves. Still, as tempting as they may be in a period of profound uncertainty, such projections using metaphors of the past to predict the shape of future electronic information landscapes essentially do not transcend the intellectual qualities of a Star Trek movie.

The present paper tries to avoid bad library science fiction in general and predictions as mentioned above in particular. The author assumes that we can hardly make any valid statements except concerning the very near future, but that it may be useful instead to describe as precisely as possible what changes and differing approaches can currently be identified in some fields of scientific information technology and economy as well as to reach an adequate level of abstraction in the description of such changes and differences.²


	1.2	... and what this paper does attempt

The paper is thus mainly concerned with the borderline between the way information is organized in electronic library catalogues on the one hand³ and in genuine WWW-based information repositories on the other hand. The main goal here is to identify some of the fundamental differentiating characteristics, be it in terms of the information entities themselves and the way they are conceptualised or the way they are referenced and their identity is established in the respective contexts, or be it in terms of the actual modes of collaboration within librarian cataloguing environments and WWW-based information repositories.

The better understanding of such differences may in turn help to better understand what actually happens within the overlapping zone between both worlds: whenever a catalogue record points to information in the WWW domain, or whenever an internet search engine encounters catalogue applications with their index files and library metadata, concepts and mechanisms from two different paradigms of information organization are made to coexist and together create a hybrid setting that can be understood better if the originating contexts of the respective mechanisms are kept in mind. The main goal thus is to identify differences and relevant questions – very few answers (if any!) can be expected from this paper. The overall aim is to help describing the often complex relation between electronic catalogues and WWW-based information repositories in terms of mutual redundancy, competition and (sometimes and hopefully) convergence.

And if some useful hints can be given at the end of the argument concerning the possible ways for both worlds to co-evolve in the near future this paper will have reached its (modest) objectives.

It should also be made clear that this paper is written from a librarian perspective: its author – although presently active in the borderline area between both worlds – has a strong background in the union catalogues community and the audience of the presentation of this paper are librarians and technicians active in union catalogue environments. The paper may thus fail to identify some points of concern that are of specific interest for a W3C-community and probably over-emphasize aspects that may seem completely trivial for an audience with a primary WWW-perspective.


	2.	The risks of pragmatism: 2 ½ examples

In order to illustrate the actual need for conceptual clarification and that is clearly of interest beyond mere academic reasoning it may be useful to consider two concrete examples taken from the authors’ daily working context (making clear that these are by no means intended to indicate appropriate solutions – rather the contrary is the case – but simply should illustrate the need for a more thorough and appropriate conceptualisation of a complex matter beyond mere pragmatism). Both examples are concerned with the coexistence of library catalogues and WWW based information services.


	2.1	„Make the WWW part of the Catalogue”

The first example is concerned with a situation most readers of this paper – at least those from the ‘hybrid’ library world – will be familiar with: the need to present coexisting printed and electronic manifestations of works to library users in a consistent service model, more specifically in the area of printed and electronic journals.

Until recently, holdings of electronic journals have not been systematically integrated in library union catalogues – even though many participating libraries spend increasing sums of money for enabling their users to access such resources via licensing agreements. This has led to a situation where libraries have started to build sometimes vast link repositories for electronic journals outside their respective OPAC environments, and along with these developments a very impressing repository of electronic journals metadata and of library ‘holdings’ (in terms of license agreements) has been built on a national scale in Germany, the “Elektronische Zeitschriftenbibliothek” (EZB)⁴. From a user perspective, the major unsatisfying aspect of this situation is the fact that depending on whether a printed or an electronic resource is to be retrieved different ‘catalogue’ environments have to be used and that there is no way of retrieving both kinds of resources using one single interface.

The problem is common to all ‘hybrid’ library architectures and is systematically recurrent on all scales – from the context of a single library to the issue of how to relate resources like CORC and WorldCat to each other.

One of the pragmatic reflexes of the library community in this situation is to try to integrate as many of the pointers to internet resources in the librarian information systems, and thus to make parts of the WWW a part of their catalogues – and one of the union catalogues the author of this paper is working with is about to move in that direction. One idea that is currently discussed within this union catalogue is to simply add all metadata from EZB (the nation-wide repository) into the union-database creating holdings data for the participating libraries and to thus ensure replication of these metadata together with the ‘holdings’ information to the participating libraries’ OPAC environments.

However (and quite paradoxically) this creates one specific problem in the case of freely accessible electronic journals such as D-Lib-Magazine or First Monday: no license agreements are necessary to access these resources and as a consequence no library specific ‘holdings’ information can automatically be generated for these resources – and here again, an extremely pragmatic solution has been devised: simply add ‘holdings’ for all libraries participating in the union catalogue in the case of such free electronic resources.

The resulting situation is illustrated by the following drawing:
Even superficial reflection of this example reveals that in pragmatically solving a specific problem in such a way this union catalogue probably is about to create numerous new problems immediately. To name just one aspect, the concept of ‘holding’ information, which already is a questionable construct in the case of licensed electronic material, looses consistency almost altogether in such an approach. We will come back to this aspect as well as to the overall risks of inconsistency induced by such an approach in later parts of this paper – the aspects already mentioned may be sufficient to illustrate the problematic nature of an approach systematically trying to integrate pointers to WWW-resources in library catalogues.


	2.2	„Make the Catalogue part of the WWW“

The alternative (or possibly complementary) approach often is considered when discussing the fact that librarian information resources tend to be ignored within the overall information economy of the WWW mainly because they are part of what has been called the “hidden web”: metadata contained in library catalogues mostly are ignored by the leading search engines simply because the application layer used to access these records is not transparent for generic WWW-technology and therefore “hides” away the resources it should make accessible.

Solutions to this problem are often discussed in terms of making library catalogues WWW-transparent more systematically and to thus simply make catalogues part of the WWW more generally. The overall aim of such strategies is to ensure presence of metadata from library environments (OPAC or union catalogues) in result sets generated via WWW-based search engines and to eventually even ensure that these get ranked highly by virtue of their high granularity and the quality of the indexing information they include.

Even though appealing at first sight, the consequences of such a strategy could be far from desirable, especially if such an approach was adopted by all major university and research libraries plus a significant number of union catalogues: the first and most striking effect would be extreme redundancy of information quickly approaching unwanted levels of information entropy: which user would actually wish to be submerged by thousands of metadata records pertaining to James Joyce’s “Ulysses” from libraries all over the world when doing a search for “Ulysses” in Google?

Moreover – and more annoying still – users would then be confronted with result sets that point to information objects in very different ways: while direct access to an information resource via a URL-pointer may be possible in some cases the user would be confronted to differing and various principles of mediated access in the case of metadata originated from libraries; this effect would surely shed some doubts upon the results of such a strategy of “unhiding” library resources.


	2.3	More integration strategies … and the need for distinctions.

A third prominent integration strategy will just be mentioned in this opening context: the systematic use of library systems as gateways to WWW resources⁵. Integration strategies built around concepts of open and context sensitive linking as part of library information infrastructures⁶ can be seen as a more generic – and possibly appealing – variant of this approach.

Without further discussing these and other integration strategies in detail at this stage of the argument it should have become clear that any over-pragmatic strategy simply combining library and WWW resources and that remains unaware of the fundamental differences of the respective information resources is unlikely to produce satisfying long term results. This observation does not question the actual need for integration strategies (and we will come back to this point later in this paper) but simply should make clear to what extent such strategies need to be built on clearly established distinctions between the information landscapes we ultimately seek to combine.

The following sections of this paper are concerned with such distinctions. It should be made clear in this context that in order to establish clear distinctions the author will sometimes deliberately ignore ‘hybrid’ infrastructures in these sections – only after having established the basic difference at the heart of the respective argument will such hybrid (and mostly secondary) settings be re-introduced to blur the clear picture again.


	3.	Differing Basic Elements and Concepts: Entities, Pointers, Identities

Library and union catalogues on the one hand and WWW-based information resources such as Yahoo or Google or any repository built on a metadata harvesting protocol such as the one specified by the Open Archives Initiative (OAI-PMH) on the other hand share a number of basic instances and entities as part of their information infrastructure. They mostly contain a distinct metadata layer including pointers to the actual information objects together with a user interface typically including support for search and retrieval operations. Furthermore, some means to identify users and information objects must be present somewhere within the respective system – the authentication layer - together with functions that are used to determine what kind of operations a given user (or class of users) may apply to a given information object (or class of objects) – the authorisation layer.

From such a perspective – and thus seen from 20.000 ft. above the ground – information systems originated from the library world and from the WWW have indeed a lot in common: the following diagram visualises the basic components mentioned above and could be used to describe library information systems and genuine WWW-based systems alike.

However, when getting closer to the ground, some basic differences begin to appear, and the following section is essentially concerned with those differences the author would term ‘distinctive’ (or ‘opposing’) properties (as opposed to variations in detail and granularity).

It may come as a surprise that – when looking at concrete examples - relatively few such distinctive/fundamental oppositions can actually be identified in the areas of search and retrieval operations and of ‘bibliographic’ metadata, but that it is assumed in this paper that the main differences reside in the ways the information objects themselves are conceived, in the way access to these objects is organized and in the field of authentication and authorization mechanisms.

At second sight, the surprise should be moderate only: search interfaces for electronic library catalogues relatively are a very young component of libraries and library co-operations and from the beginning of their short history have evolved much more in line with features and requirements of generic, non-librarian automation technology than for example the books themselves, the nature of which has been shaped over centuries and long before the birth of electronic information processing.

As far as ‘bibliographic’ metadata are concerned, the assumption may be more controversial, especially within a librarian audience: after all, many librarians still regard metadata generation (in the sense of cataloguing) as the very heart of their business, and it may be hard for these to admit that vital issues may well be defined outside the scope of cataloguing principles and practice – but still the assumption is maintained: many of the guiding principles of cataloguing that had their origins in the sequential organization of card catalogues and that have initially been preserved in electronic cataloguing environments have either vanished or are at least reconsidered seriously. And even in those cataloguing databases that still contain important layers of data oriented towards card-catalogue production the creation of a DC-like interface is comparatively straightforward – much easier, anyway, than to convert data the other way round: trying to generate traditional cataloguing data from a Dublin Core-source probably would turn out to be much more of a challenge, if anyone would be interested in the exercise at all.

Furthermore, one of the few significant structural differences in the metadata area – the ‘holdings’ or ‘copy’ notion of library catalogues that has no real equivalent in WWW-based information services – can actually be addressed more appropriately in the sub-section about pointing and actual access to information objects.

Thus, even though much of the earlier discussion in the subject area of this paper has been focussed on metadata, the assumption is maintained that the crucial differences do not lie in this field, either.⁷

Instead, some very evident fundamental differences can and will be identified in the remaining three component areas – although this often means nothing more than just recalling very simple and trivial truths that still often are forgotten when considering the relation of library catalogues and of WWW-based information services.


	3.1	Books vs. digital information objects: the basic information entities

The first point to be aware of is the profoundly differing nature of the basic entities in terms of information objects. Library catalogues and automation systems are designed to contain descriptive cataloguing records for books and book-like printed information together with pointers to the actual physical copies of these as present on library shelves. WWW-based information systems are designed to contain identifying (and some basic descriptive) information pertaining to electronic information objects (and most typically hyperlinked objects stored somewhere in the network at any location that can be addressed via HTTP) together with pointers to these objects.

This distinction, as trivial as it may actually be, is often and quickly forgotten – and still it has to be raised to a very high level of abstraction before it ceases to have numerous consequences.

However, we will not deal with this distinction in detail here, since a lot has been published on many of its aspects, such as the fact that paper books and other paper publications are combined presentation and storage media where the display of information is altogether visual and the content is physically tied to the paper and the pages of the publications, whereas in electronic publications, storage and presentation are separate or the fact that additional electronic devices are required for access to the content of digital information objects whereas books can simply be read using our human senses or again the fact that automated operations on content are possible in electronic information objects in a way that is unconceivable for printed material.

The fact that many digital information objects still are modelled upon the example of printed books should not make us forget the fundamental differences between them: digital information objects will evolve from book-like analogies into new forms of information modelling, forms we do not yet have names for - and this fact is about the only excuse for using such terms as ‘e-books’.⁸


	3.2	Shelfmarks vs. links: the pointers from metadata to the information objects

The second area where both worlds differ substantially is concerned with the way they organize access to the actual information objects for their respective user communities. To state it simply, library based information systems originally are based on the idea of mediated access, whereas the original principle of WWW-based systems is one of direct, instant access. The principal reason for this is the fact that librarian information objects (books and the like) simply are not kept within the information system (the catalogue) but on the library’s shelves, whereas in the case of WWW information systems the information objects technically are part of the system (or technically can be part of it, at least).

This seemingly trivial observation has two very important consequences for the respective architecture of these information systems:


	·	in a library information system, the user is interacting with metadata on all levels: not only with ‘bibliographic’ metadata, but also with a metadata substitute for the real information object within the information system, the copy record, which in turn contains a pointer to some instance outside the system that will mediate access to the information object for the user. WWW-based information systems have no equivalent of this ‘copy’ or ‘holdings’ layer, because the information objects themselves are a technical part of the system
	·	as a consequence, the pointers to the actual information objects have fundamentally different functions within the respective systems: the ‘shelfmark’ or ‘lending number’ pointers point to some instance outside the library catalogue (a librarian or a lending module) that will interpret it and finally grant access to the information resource in a way the information system has no knowledge about, whereas the URL pointer (or any technical successor in WWW-based information architectures) basically points to the information object itself that is technically kept within the system (not necessarily stored there, physically, but well part of the system’s technical architecture).

These observations account for numerous functional and technical incompatibilities between librarian and WWW information systems and it is important to fully understand their implications before combining working principles from both worlds. The ‘copy’ level of librarian systems is difficult to translate to the WWW world and the pointers to the actual information resources react to very different functional requirements.

Especially the latter difference needs to be paid additional attention. The ‘shelfmark’ string in the library system may contain almost any information that can be interpreted by humans: from the actual shelfmark (“X 1989/1234” or the like) to information like “go to room 202 and ask there” or even simply “go and ask the librarian”. Even the copy or call number can be erroneous: the lending system module will not recognize it and ultimately some librarian will be there to help with the matter: the pointer goes outside the system, anyway, and the responsibility for resolving the pointing information is outside the system, too. This is the reason why our union catalogues and library OPACs containing such an amazing number of incorrect shelfmark information still do not cease to function.

The situation is radically different with URL-pointers within WWW-based information systems: one character missing in a URL will simply generate code 404 and not reveal any information beyond this error message. Mostly, no external instance can be called upon to correct the pointing information: the correctness and reliability of the pointer are a vital constituent of the information system. This is why the protocols for constructing and resolving HTTP pointers are relatively strict and elaborate (even though insufficient: there will be successors to URLs as we know them today!) whereas shelfmarks and copy numbers are variable string values with almost no restrictions at all.

Of course, notions of direct access to resources have been added to library based systems in the recent past and access control mechanisms and restrictions have been implemented in various ways in WWW-based systems – but still the original governing principles of mediated vs. direct access have been at the origin of the respective systems’ design and of the pointing mechanisms respectively used. This is an important fact to remember when one tries to understand what happens to Internet-pointers in library systems.


	3.3	Identity and credentials: authentication and authorization

Instances that are taken for granted in one information environment may cause close to metaphysical problems in another.⁹ This fact can be illustrated with one simple, yet striking example, considering the way persons and information objects are identified in both worlds and the way authorization to use a given resource is determined.

In the ‘real’ world, when trying to establish the identity of a library user, one simple and effective way would be to ask for his passport or ID-card. A certain number of additional checks can then be performed: if the ID-document bears the same name the user claims to be his own and the photograph therein bears at least some resemblance with the owner and furthermore the document has been issued by a trustworthy authority, the librarian may decide that the identity of that user has been established to a sufficient degree. And if that user wanted to lend a book reserved, for instance, for local residents, a simple check of the address in the user’s ID-document would quickly solve the issue: authentication and authorization can thus be established to a sufficient degree using simple and robust techniques.

However, one of the key factors for the efficiency of this approach is indicated by the words “to a sufficient degree”: the user’s identity is never established with 100% certainty and there is no need to do so, since a complex set of context information is combined to dynamically evaluate the level of trust required and the degree of certainty needed as a consequence.

The situation gets far more complex once we look at digital authentication scenarios: in this context, identification and authentication information must often be established to 100% or simply aren’t established at all. In a binary logic, identity is either established or not, and no such notion as “to a sufficient degree” can ease the task. Certainty thus has to be established to a degree that is almost never required in “real life” environments. Or, as C. Lynch puts it:

“In the digital environment […] computer code is operationalizing and codifying ideas and principles that, historically, have been fuzzy or subjective, or that have been based on situational legal or social constructs. Authenticity and integrity are two of the key arenas where computational technology connects with philosophy and social constructs.” (LYNCH 2000)

And the annoying fact is that this problem description is not only valid for persons operating in digital information environments, but for digital information objects as well: the identity and integrity of a printed book is far more easier to determine than the identity and integrity of it’s digital equivalent.

Moreover – and although such information is far more difficult to establish in digital environments – the lack of unambiguous authentication and identification information can completely block a digital information system while almost always some flexible strategy of dealing with this lack of information in conventional information environments can be devised.

As a consequence, tremendous efforts have to be made in digital information environments in order to determine what kinds of operations a given user may use upon a given object, and this is placing constraints upon the way such environments function which are almost unknown in ‘conventional’ librarian contexts.


	4	Differing modes of collaboration: the Cathedral and the Bazaar

The second main area in which pertinent differences can be located (and that must be accounted for) is the way the respective communities co-operate: library catalogues and federated information environments in the WWW have very different traditions of organizing and living co-operation.

The first striking and almost trivial difference concerns the partners co-operating in such settings: libraries – as different as they may perceive themselves among each other – are by far a more homogeneous group of organizations both in terms of decision making as in terms of user requirements as compared to the heterogeneous groupings of companies, individual scientists and more or less formally organized parts of the academic community that typically make up the user/production base of federated information services in the WWW.

This basic difference leads to an important secondary observation: rules and guiding principles as well as common policies for information management can be imposed much more effectively in a relatively uniform and close user group such as the libraries sector, whereas the typical setting within the internet can never be prescriptive to such a high degree.

This basic cultural difference is similar, to some extent, to the differences described by E. Raymond in his essay on “The Cathedral and The Bazaar”, and this is the reason why the title of his essay is echoed in this section’s title. More precisely, Raymond’s paper is basically concerned with different modes of collaboration and differing modes of communication when comparing the traditional community of software engineers, for whom the ‘cathedral’-building metaphor is used to the open source development community, to whom the ‘bazaar’ metaphor is applied.¹⁰ And having a closer look at modes of collaboration and of communication helps a lot in identifying fundamental differences on community-level this time.

I will only indicate the directions a closer analysis would have to investigate and that probably would yield very fruitful results but cannot develop on these aspects here for reasons of space mainly.

If one has a closer look at the respective ways a WWW-development and library staff are collaborating one would immediately find that the librarian collaboration model is almost obsessed with rules, whereas such rules do hardly play an important role in the WWW environment, where their structural position is taken over by protocols. Likewise, librarian working environments tend to be highly prescriptive if compared to the rather experiment-oriented WWW environments. And finally librarian settings seem to have a strong tendency to establish pre-coordinating frameworks whereas WWW environments tend to assemble collaborative resources first and then post-coordinate their actual collaborative use.

In the field of communication modes, similar observations can be made. Whereas librarian communities seem to tend towards hierarchical communication models, WWW-communities have a rather ‘flat’ information culture. ‘Channelled’ vs. ‘broadband’ perceptions of the communication lines seems another relevant distinguishing factor. And one could also argue that the librarian way of organizing communication is very much oriented towards aggregation of information, whereas the WWW communication paradigm seems to be heavily oriented towards distribution of information, the two worlds thus focussing on two very different aspects of communication practice.

One could even go as far as to speculate on the differing modes of perception and of mental organization of information units that seem to be at the roots of the respective communities and might then end up reflecting on the community difference in terms of identity vs. difference … - but such heavily philosophical speculations definitely widen the scope of this paper all too excessively and therefore are left for a later contribution.

The important aspect of this section was to create consciousness of the way the respective communities differ ‘culturally’, in their modes of communication and of collaboration: together with the conclusions from the third section this should now be a sufficient basis for discussing possible scenarios for the future relation of these two cultures.


	5	Modes of coexistence: future choices and bridging concepts


	5.1	Coexistence? Coexistence!

It should have become clearer by now, in what sense the clear recognition of the fundamental differences between both information paradigms helps to better understand the often unexpected effects produced when transposing objects and methods from one world to the other. While such combinations of objects and methods stemming from very different contexts cannot be avoided altogether and even must be accounted for systematically in ‘hybrid library’ settings it is still useful to keep in mind the side-effects that are produced in such an approach.

The recognition of these differences also can help to conceptualize the possible future relation between library catalogues and WWW-based information services without falling back into the bad habit of excessive and fruitless prediction-making mentioned in the beginning of this paper.

In this attempt to take a modest look ahead two assumptions are made. The first one is that both worlds will be around coexisting for quite some time from here, and even though one paradigm of information organization may ultimately succeed to the other such a possible future situation is far beyond the scope of this paper: it is thus assumed that libraries with their catalogues and WWW-based information architectures will coexist. The second assumption is less evident: it is also assumed that real choices can actually be made in organizing this coexistence and that the co-evolution of both paradigms is not governed by some obscure cybernetic natural law making things happen fatally: the end of this paper should thus be devoted to actual choices we could – and should – make in this area.


	5.2	Redundancy, Competition, Convergence, Integration

The possible relations of present and future co-existence can be described using (at least) four different concepts. To begin with, two of these are rather unproductive and ultimately inappropriate. Redundancy maybe is the least desirable one: modelling the same information objects redundantly in two contexts is expensive, inefficient and carries a high risk of long term overall inconsistency. This is true for all approaches resulting in redundancy, be they based on parallel unconnected activities in both environments that are not concerted in any way or on data replication scenarios. Competition isn’t an appropriate concept, either, even though it may appear inevitably in many political contexts, where both paradigms are competing for the same resources (usually money) and therefore are perceived as functionally and technically competing, although they serve fundamentally different needs.

Two other concepts could be more fruitful and may help to establish productive and realistic objectives. Provided the fundamental conceptional differences between both paradigms are well understood their relation could evolve either in terms of convergence or of integration. Convergence in this context would mean that both worlds move towards the same objectives getting continuously closer to each other and possibly creating more and more overlapping areas without, however, blending both paradigms altogether: catalogues and WWW-based information systems remain clearly discernable worlds in this approach. Integration, on the contrary, would mean that both worlds are actually blended into something new embracing both paradigms and serving the needs of their respective communities in one common approach of information modeling.

Examples can be found of all four principles in organizing coexistence in our present professional experience: most readers of this paper will be able to quickly identify examples of redundant, competing, converging or integrating scenarios in their own working context. The author of this paper is convinced that (at least) these four scenarios of co-existence will remain valid options within the next years and that it is up to the stakeholders of both worlds to make their choices among them. Such choices will be triggered by many factors: money, politics, economical interest – to name just a few powerful ones outside the scope of what readers of this paper will typically be able to influence. There are, however, two concepts in the area of information architecture that may help to orient this co-evolution in the direction of convergence or – most likely in this case – of integration, and the promotion of these two concepts would be a very useful contribution of the union catalogue community to the shaping of future co-operative scenarios.


	5.3	Bridging concepts: FRBR and the Semantic Web

Two important bridging concepts in that sense might well be the metadata layering model expressed in IFLA’s “Functional Requirements of Bibliographic Records” (FRBR) and concepts currently taking shape in the “Semantic Web” approach¹¹. The general reason is that both concepts raise the level of abstraction concerning information entities that are present in both information paradigms sufficiently high in order to potentially embrace both worlds and thus may play an effective bridging role. This general assertion will shortly be looked at more in detail to end up this paper.

Semantic Web technology, to begin with, and methods based on semantic web ontologies more specifically are likely to make new and productive use of the fine-grained semantic metadata, which libraries traditionally have been producing. These could be used for enhancing the taxonomies of semantic web ontologies. Assertions based on the use of classifications and indexing schemes could easily be transposed into taxonomy elements that in turn greatly broaden the basis inference rules can be applied to. This results in a much richer taxonomic base for ontological operations and could well generate an ongoing process of library work being fed into semantic web ontologies.

Likewise, the integration of semantic web techniques in library catalogues not only for search and retrieval operations, but also for generating - for instance - proposals for classification attributes using inference rules may well help a lot in everyday library work: a rule of the type “If a work by a given author has a given classification element associated to it and if the publication years of another work by an author with the same name are adjacent, the same classification element is likely to apply to this item” would probably yield useful and time-sparing classification proposals for newly catalogued items.

It is assumed here, that semantic web based approaches will primarily contribute to dynamics of convergence.

The FRBR model resulting in a layered metadata architecture, on the other hand, has the strategically important asset of making possible a combination of metadata architectures as they are typical for library union catalogues (and as discussed above in section 3) and of the ‘flat’ metadata models that are typical for WWW information architectures. If consequently applying FRBR-based approaches to the evolution of their catalogues librarians could substantially decrease the annoying effects that were described above and that today contribute to keep librarian metadata resources within the ‘hidden web’.

Establishing coherent unified concepts of what semantic entities, expressions/manifestations and item derivates actually are and relating these in one model that makes ‘hybrid’ information settings appropriately conceivable is one of the major assets of FRBR: from such indications it should be clear that approaches based on the FRBR model probably have a very high integrative potential.

To end with it is thus reconfirmed that it does not seem very wise to want to predict future evolutions all too strongly, but that probably librarian and WWW communities would at least not make major mistakes by investing major efforts in semantic web technology and in hybrid information models based on the FRBR-approach.

Bibliography

[GRADMANN 1998] Gradmann, Stefan: Cataloguing vs. Metadata: old wine in new bottles? In: 64th IFLA General Conference August 16 - August 21, 1998, proceedings. Online at http://www.ifla.org/IV/ifla64/007-126e.htm), also in International Cataloguing and Bibliographic Control 28,4.1998 pp. 88 – 90

[LYNCH 2000] Lynch, Clifford A.: "Authenticity and Integrity in the Digital Environment: An Exploratory Analysis of the Central Role of Trust," Authenticity in a Digital Environment (Washington, DC: Council on Library and Information Resources, 2000), pp 32-50. Online at http://www.clir.org/pubs/reports/pub92/lynch.html

[MCKIERNAN 2002] McKiernan, Gerry: eProfiles. Innovative Information Systems and Services (Online at: http://www.wils.wisc.edu/events/wworld02/present/eProfiles.ppt)

[RAYMOND 2001] Raymond, Eric S.: The Cathedral & the Bazaar. Bejing [etc.]: O’Reilly 1999.

[SCHOTTLAENDER 2000] Schottlaender, Brian E. C.: “The Catalog as Portal to the Internet” by Sarah E. Thomas. Washington: Library of Congress 2000. Online at:
http://www.loc.gov/catdir/bibcontrol/schottlaender_paper.html

[THOMAS 2000] Thomas, Sarah E.: The Catalog as Portal to the Internet. Washington: Library of Congress 2000. Online at:
http://lcweb.loc.gov/catdir/bibcontrol/thomas_paper.html

[VAN DER SOMPEL 2001a] Van de Sompel, Herbert and Oren Beit-Arie: Open Linking in the Scholarly Information Environment Using the OpenURL Framework. D-Lib Magazine, 7:3 (March 2001). Online at:
http://www.dlib.org/dlib/march01/vandesompel/03vandesompel.html

[VAN DER SOMPEL 2001b] Van de Sompel, Herbert and Oren Beit-Arie: Generalizing the OpenURL Framework beyond References to Scholarly Works: the Bison-Futé Model." D-Lib Magazine, 7:7/8 (July/August 2001). Online at: http://www.dlib.org/dlib/july01/vandesompel/07vandesompel.html

¹	With many thanks to Kim Braun, Peter Cox and Jela Steinerova for their valuable comments and suggestions.
²	When using the term “WWW-based information services” in this paper, I wish to indicate such services as the NASA Astrophysics Data System (ADS) or the NEC Research Institute Research Index as well as more generic services such as Google or Yahoo. ADS and the NEC Index are well presented and discussed at length in a very detailed presentation given by Gerry McKiernan at the WilsWorld ’02 conference (McKiernan 2002). When announcing this presentation on the conference WWW-site the following assertions are made: “In recent years, a number of experimental and operational Web-based information systems and services have emerged that offer advanced and novel features, functionalities, and content. In this presentation, a variety of these innovative services will be profiled, as will their associated technologies. The potential impact of these systems on the development and enhancement of commercial and library information services will also be reviewed and discussed.” – however, the latter aspect, although announced, is not really discussed in the presentation itself. The present paper therefore can be seen as a complement of McKiernans work that is very extensive as far as WWW-services are concerned but quite restricted as regards libraries. As a consequence, librarian aspects are stressed to a higher degree in the present paper.
³	The term “catalogue” hitherto is used as a synonym of “electronic catalogue” throughout this paper, which is thus implicitly restricted to electronic metadata as part of librarian or of WWW-based information infrastructures. The author is aware of the segment of traditional cataloguing reality that is thus deliberately excluded from the scope of this paper – on the other hand, a comparison of traditional card catalogues and WWW-based information services would not have made much sense.
⁴	“Electronic Journals Library” would be a rough English equivalent. EZB can be accessed via http://rzblx1.uni-regensburg.de/ezeit/
⁵	This has been proposed, for instance, by S. Thomas in her reflections on “The Catalog as Portal to the Internet” (THOMAS 2000) that has provoked some interesting discussion as in the reply by B. Schottlaender (SCHOTTLAENDER 2000).
⁶	Such concepts are presented in detail in the contributions from H. van der Sompel mentioned in this papers’ bibliography.
⁷	This assumption does not contradict assertions made by the author of this paper earlier in GRADMANN 1998: the distinctions made there concern less the actual bibliographical metadata but rather the respective contexts of use and the originating communities of library and of WWW-metadata.
⁸	For the very same reason the term “digital library” can be considered as intellectually quite dubious: an institution either deals with books (and then can be called a library) or with digital information objects (and why should it then be called a library?).
⁹	A very sound introduction to the issues of authenticity and integrity is given in Lynch 2000.
¹⁰	Raymond then goes further than I want to go here: he proclaims the bazaar model to be more powerful than the cathedral model, whereas I have no intention to transpose that conclusion to the context of this paper, too: this is where the reference to Raymond’s paper has its clear limits.
¹¹	This assumption is by no means meant to be exhaustive: there surely are more examples of bridging concepts and the author just tried to identify two prominent ones.