Information for libraries

  • our website

You are here: Home Archives 2016/1 Reviewed articles Exchangeable formats of bibliographical data: their present transformation

Exchangeable formats of bibliographical data: their present transformation

Summary: Exchangeable formats of bibliographical data in the practice of libraries have been used from the 60ies of the last century, and in spite of the decades having flown by no substantial change has taken place, although the librarians have called for it. The deficiencies and the obsolescence of the MARC type formats are obvious. The pressure for change has been increasing from the turn of the century. The objective of the present article resides in summarizing the activities on this field, including an outline of possible further development. The analysis of diverse projects and of the respective questionnaires allows us to estimate what direction of development this field is likely to take. The activities tend towards the linked data publication model, while retaining the cataloguing practice of the present day.


Keywords: exchangeable formats, bibliographical data, linked data BIBFRAME, Schema.org, cataloguing

PhDr. Klára Rösslerová / Filozofická fakulta. Univerzita Karlova v Praze (Faculty of Arts, Charles University in Prague), náměstí Jana Palacha, 2, 116 38 Praha 1, Česká republika

1 Introduction

The most broadly used exchangeable format of bibliographical data of the present day is MARC 211, and to a smaller extent also UNIMARC (Universal MARC, Universal format MARC). For quite some time already opinions have been voiced that formats of this type are obsolete, but at the same time you cannot simply take over a different format, transfer huge amounts of data and adapt all librarian systems working with the existing structure. Bibliographic data in these formats have been created, stored and distributed over decades. However, the easy period in the global librarianship obviously seems to belong to the past.

In fact, the criticism of the MARC type format, or rather an analysis of what there is specifically outdated or superfluous in it, is of no special importance. It is much more necessary to focus upon the development of librarian science (or rather generally of the web) and to suggest what structure of data should be produced by the library, in order to permanently keep in mind the primary goal of the library in offering services to its user as quickly and easily as possible – namely exactly so as the user is habituated on the web. According to opinion polls, well over 80% of users begin their search on the free web, and only thereafter they start looking for the desirable item in the catalogues of libraries. It is obvious that as far as the libraries wish to follow the trends and get adapted to them (which they readily do in the field of publicity and general communication on the social networks), they should arrange the access to their valuable data so as to enable the user to find them with the help of current browsers.

The path leads to maximum integration of data from diversified sources, and namely not only the non-commercial, but also the commercial ones. Accordingly, the development is thus seen to lead out of the purely librarian, i.e. non-profit region, getting to the margin of business: leaving aside its exclusive status and becoming also an object of interest for companies that can add references of their commercial products to the library data.

If a link between the librarian sphere and the commercial one will be established, it can be beneficial for all participating parties, the library data being very valuable and their creation requiring sophisticated and expensive work, not to speak about the specialized education and practice of librarians.

The present article aims at forecasting possible future development in the field of exchangeable formats of bibliographical data. Is MARC 21 going to be replaced with another format? Is it going to be replaced with one format only? Or can we envisage future utilization of more formats according to the actual purpose: one for exchanging and presentation of data, another for the storage in various systems or for the exchange between the libraries?

1.1 Starting points

Exactly the same as in life, also in our sphere it is more important to look forward instead of weeping over anything in the past. However, the proposal of a reasonable future of exchangeable formats requires an analysis of the present condition, finding its errors or insufficient solutions, and making use of the analysis for defining the requirements that ought to bring us to the desirable future state of things. Roy Tennant (TENNANT, 2002 and 2002b) has handled this subject in his articles in a comprehensive, and yet concise way. The foundation platform according to him can be described as follows:

Keep what is good.
Achieve a high level of granularity.
Interlink by way of references.
Make use of hierarchic relations.
Get free from physical materialization.
Achieve expandability, flexibility.
Achieve interlinking.
Obviously the starting point will be the semantic web.

1.2 Semantic web

Prior to offering an explanation of the concept of semantic web, it seems appropriate to explain the adjective semantic, or the word semantics. Semantics, as a linguistic concept, deals with the meaning of words. In this sense a semantic web is one that is understandable for the machines. However, this does not mean to say that machines should become subjects of artificial intelligence being able to read data and to grasp their meaning the way humans do. This means that information in the semantic web is structured so as to make machines understand – grasp – recognize information, and namely thanks to marks that are used for enabling this process of recognition.

The idea of semantic web has been first encountered in Timothy Barners-Lee, the founder of web, and namely in 2001 when Barners-Lee postulated the requirement that the tangle of files interconnected with hypertext references should be replaced with a structured database, i.e. to use a web of data instead of a web of documents, which should be achieved through the intermediary of hidden marks supplying information of the meaning of data contained in the documents. Thus he wanted to solve two problems at a time: the existence of data that, being not indicated in HTML (such as databases), are not available (retrievable) for classical search engines, i.e. the problems of deep web, on the one hand side, and the problem as concerns relying upon the key words only during the search, irrespective of the actual meaning of the contents, on the other hand side (KONSTANTINOU, 2015).

The reaching of this goal – creating a linked data web, see further – is in the focus of an international consorcium handling the development of web standards (World Wide Web Consorcium, W3C) whose founder and director is Barners-Lee. The semantic web is based upon the RDF (Resource Description Framework2) technology. (World Wide Web Consorcium, 2013)

1.3 Linked data

Also the concept or the idea of linked data has been first encountered in the work of Barners-Lee in 2006. The linked data concept is a publishing model for issuing structured data on the web that is based upon web standards, such as http (Hyper Text Transfer Protocol3) and URI (Uniform Resource Identifier) as well as upon the semantic web technologies, such as RDF (Resource Description Framework)

(MYNARZ, 2010), OCLC (Online Computer Library Center). The automated library center with interactive approach has offered a very illustrative free video on its web, informing about the character of linked data and about the advantages and options brought to the libraries by this concept.

Applying linked data means creating bonds between data from various resources. Thus a linkage of most diversified data created by organizations in different parts of the world with no other interconnection among themselves can take place, or, contrarily, data can be part of heterogeneous systems of one organization (BIZER, 2009). Bizer, Heath and Barnet-Lee expand the idea of data web to make “the web of things described worldwide according to data on the web” (2009).

A subset of linked data is linked open data, which means publishing structured data, but in an open format, i.e. data available for everyone (KONSTANTINOU, 2015). Such data can be re-distributed, re-used and can become the basis for further utilization, including commercial purposes.

1.4 Advantages of using the linked data method

The work of Health and Bizer(Health, 2011) is a summary of the linked data potential:

The RDF principle can be used by anybody worldwide to link anything with anything.

Any URI allows the users to retrieve complementary information.

Information from diversified sources can be simply interlinked by combining two

triplets (subject-predicate-object) in one single diagram.

RDF enables the imaging of information expressed by diversified schemes in one

single graph.

Let us offer an illustrative example in the retrieval of the expression “Jan Hřebejk”,

Czech filmmaker, in the Google browser (see Fig.1). Let us notice the basic

information concerning the person of Jan Hřebejk with a group of his portraits and

further pictures on the right hand side, combined with the text on the left hand side.

Fig.1 Print screen depicting the results of search for the expression “Jan Hřebejk”

printscreen.png

1.5 The way to semantic web in the field of library science

During the first decade of the new millennium also the experts for libraries began to explore the region of semantic web with increasing frequency. In 2008 a librarian of the University of California & Television archive, Martha M. Yee, introduced her own cataloguing rules and a RDF model, a document wherein she maintains that the bibliographical universe is (too) complex, and that is why the role of a catalogue should consist in alleviating this complexity with the aim of simplifying the search for the user. The librarians ought to mark single bibliographical data so as to enable systems to handle them and to submit them to the user in an easily understandable form, so as to image, upon one single click, all further works of the given author, all further editions, all other units etc. Moreover, all that could be executed by commercial browsers of the Google type instead of the library catalogues (YEE,2008)

In 2006 the Library of Congress in Washington appointed Karen Calhoun to prepare a report of changes in the cataloguing process and the catalogues, library rules and formats and in general of the trends in the field of information services and librarianship. Karen Calhoun summed up the essential requirements and changes in a number of points, of which the following ones appear to be relevant for the present work: more stress upon shared catalogues, sharing of approach, catalogues as means supporting digitization projects, and further individual preparation for linkages.

However, in particular the idea in the back of the proposal sounds interesting, namely that the exchangeable format should be developed in the direction towards the format MARC-XML4 yet leaving the actual structure of MARC format unchanged (CALHOUN 2006). There are voices opposing the recommendation of Karen Calhoun, such as that of Birghid Gonzales, summing up in her article (GONZALES 2014) that MARC-XML is utilizable simply since there is nothing better at the moment.

In June 2006 the ALA (American Library Association) conference took place where the vice president of CLIR (Council of Library and Information Resources), and later deputy director for library services of the Congress Library Washington, Deanna Marcum, formulated the requirement that cataloguing should be given an earnest thought in the light of the progress in information technologies. She suggested the creation of a working group that would deal with the problems from a sufficiently detached viewpoint. Such group was founded in the same year, and namely Library of Congress Working Group on the Future of Bibliographic Control. It set itself a goal in proposing a different method of bibliographical control, one that does not start at the present state of things:” We should not try to correct the existing systems, but rather pretend that we have only just returned to this planet from the Mars… (Library of Congress, 2006).

The working group, headed by dr. José-Marie Griffiths, Dean of SILS (School of Information and Library Science) of the University of North Carolina at Chapel Hill, has fifteen members from the sphere of educated public (cataloguing experts) and that of librarians, and also a spokesman from Microsoft is present. The group has declared its willingness to collaborate with the National Federation of Advanced Information Services, http://nfais, the former National Federation of Science Abstracting and Indexing Services, whose members are outstanding libraries, but also fully commercial entities, such as EBSCO or Elsevier. This makes it obvious that the working group pays due attention not only to the field of library data, but also to commercial catalogues (databases), as it has later clearly declared (intention to collaborate with the commercial sphere and with the actual users).

1.6. Report Of a record

In 2008 the working group presented its report Of a record (Library of Congress, 2008b). The report invites the professional public (“calls it to action”) to participate in solving five highlighted topics. The premises are as follows:

Bibliographical control is not restricted to the cataloguing process, but pertains to all types of materials that are made available by the library to variegated groups of users from all sorts of locations.

The bibliographical universe does not cover only libraries, producers, databases and publishers, but also sellers, distributors and all other possible groups of users, irrespective of frontiers.

The Library of Congress, considering the development of information technologies, should not play the role of a single possible producer of bibliographical records in the United States of America, and should not be perceived as such.

The five fields, summarized by the Report as topics for discussion, are the following:

Improving the effectiveness of bibliographical production by cooperation and sharing.

Opening up the access to types of documents that are unavailable at present.

Accepting the fact that users of bibliographical information are not only persons, but also machines (applications).

Adapting to the present day trends and enabling insertions in the records, such as evaluations of the users.

Getting consistently educated.

The Report brings some essential recommendations. Their prevailing majority concern the distribution of work as well as responsibility, and their main objective resides in doing away with duplicities of work as costly and fully superfluous in the light of the present progress of technologies. One of the prerequisites is to stop insisting in further meticulous observation of the American library standards. The existing cataloguing standards should be analyzed and possibly revised so as to be applicable also outside of the domain of libraries. In addition to that it is necessary to create conversion programmes enabling to share data across different data producers and distributors, so as to comply with the needs of all interested parties, namely not only the libraries, but also various information service agencies, such as Amazon and IMDb (portal enabling the access to film databases, to TV series and to further contents in connection with the same).

As regards the future of the formats of bibliographic data, however, there is one prerequisite, namely one basic change that is indispensable for all the above: as long as the library world uses the forty years old, and necessarily quite unsuitable format MARC 21, it cannot effectively cooperate with the remaining groups of producers and distributors of data, it cannot effectively hand over its data outside of the librarian systems, thus being unable to meet the vision of maximum cooperation and distribution. That is why it is necessary to create a future record carrier enabling communication between the library systems without any hindrances, such carrier being suitable not only for the libraries, but also for various other user communities.

In addition to the report Of a record the working group created also its web site (http:/www.loc.gov/bibliographic-future) and three large working encounters with the professional public took place: under the leadership of the Google company a meeting with users of bibliographic data was organized, then the American Library Association convened an encounter focusing on the topic of data structure and the Congress Library arranged a meeting where the economic side of the bibliographical systems was the main theme. (Library of Congress, 2008b). Then the employees of the Congress Library, headed by Deanna Marcum, formulated an official Answer (Library of Congress, 2008b), giving their unambiguous support to the above Report. Anyway, they took the side of supportive policy concerning open access, and namely bearing in mind the small and insufficiently financed institutions. Further they also call for the completion of the cataloguing of the not yet processed funds and making them available in the online catalogue. They expect the working group also to offer a proposition of a retrospective tackling of the funds. The text of the actual Answer gradually focuses upon the fields singled out in the document Of the record, analyzing them and also complementing them by information in what stage the Library of Congress is at the given moment (what it has or has not begun to solve), and possibly suggesting solutions that exist. The single proposals unveil a clear-cut and obvious process tending towards the interlinking of the commercial and non-commercial sphere, and namely by sharing bibliographical data among the libraries and the commercial entities, such as Amazon or, e.g., the rationalisation of work directly in the Library of Congress by way of analyzing separate operations and verifying whether there are any duplicities (be it during the creation of CIP, allotting ISSN etc.). Then concise technical solutions are proposed for data sharing, their collection etc.

The Report Of the Record, however, is seen to have provoked also negative reactions in the librarian community, and namely not by its actual contents, but rather due to the fact of having left aside the trend of open access, in this case omitting the principle of open linked data. Jonathan Gray from the group Open Knowledge Foundation formulated a manifest, in answering the document Of the Record, wherein he requires the libraries to open themselves to the world. His postulate is based upon the proviso that bibliographic records, being part of the cultural heritage, should be accessible to the broad public for further use without any restrictions, be it commercially or non-commercially. By way of examples he mentions the use of library data for the creation of web sites intended for enthusiastic readers, for preparing all sorts of statistics for the scientists, for journalists etc. (Open Knowledge Foundation, 2011). This document bears the signatures of 157 librarians, information experts and private persons from the whole world; the employees of Italian universities prevail among the librarians, whereas the rest recruite from various regions. The common denominator appears to be open access: the undersigned are employees of institutions functioning on the principle of open access or declaring it publicly. However, no institution as a whole presents itself as an advocate of the initiative.

1.7. Single projects

Since 2011 the librarian science shows practical progress in the field of linked data. An alternative to the MARC type formats has been under development in the Library of Congress in Washington, but also OCLC has joined in by experimenting with the Schema org. model.

1.7.1. Research of OCLC

In summer 2014 OCLC inspired a survey relating to how information in form of linked data is provided by leading libraries, archives, metadata services and digital libraries (OCLC Research). The enquiry consisted of six simple questions:

Who provides data in form of linked data

Examples

Who makes use of such data

Why are data offered in this form

Technical details

Advice from providers

The organizers of this survey obtained the total of 96 relevant responses from 15 countries. It is interesting that some institutions only produce data in form of linked data, or on the contrary they only consume them. About one half represent both the producing and the consuming side. The institutions come from the United States of America, Australia, Canada, France, Germany, Ireland, Italy, the Netherlands, Norway, Singapore, South Korea, Spain, Switzerland and Great Britain. Although some of the projects are intended for non-public records, there are some giants among them: the OCLC WorldCat catalogue containing over 2 billion records (and all in form of linked data) is at the same time also the most frequently used one (with 16 million inquiries per day, the catalogue of the Library of Congress in Washington, the British National Bibliography. Outside of the book libraries also the Dewey decimal classification (with conversion executed by OCLC) or the VIAF (Virtual International Authority File) are provided in form of linked data.

The model of semantic framing BIBFRAME of the Library of Congress and the company Zepheira, and further Schema.org used by OCLC (GODBY), range among the most outstanding activities in the field of linked data within the sphere of libraries.

1.7.2 BIBFRAME of the Library of Congress in Washington

In May 2011 the Library of Congress in Washington made the official announcement about the foundation of the Bibliographic Framework Initiative

(http://www.loc.gov.bibframe a www.bibframe.org) whose purpose consists in analyzing

The bibliographical description as such

The actual creation of data

The exchange of data including the protocols of exchange

In the follow-up the aim of this analysis resides in replacing the MARC type formats with BIBFRAME, Bibliographic Frame (Library of Congress, 2012d).

In its report Bibliographic Frame for the Digital Age (Library of Congress, 2011) the Initiative follows in the footsteps of the mentioned Report Of the record, but at the same time it already refers to the link to the RDA cataloguing rules that are progressive in many respects, but their potential cannot be utilized to the full in combination with the MARC type format.

Report Bibliographic Frame for the Digital Age

The report defines the following requirements concerning the environment of the bibliographic frame: it should be independent of the cataloguing rules, data regarding the entity, authority data, data concerning the rights, the material description etc. should be codified; text data linked with identifiers URI should be applied (instead of text only); the cataloguing experts will not work directly with the format (bibliographic frame), the way they were habituated to do with the MARC format; the bibliographic frame will be intended for libraries (institutions) of all categories and specializations; and whereas the MARC 21 format will continue to be maintained for the time being, it will be supported only from the viewpoint of the implementation of the RDA Rules; the frame will be compatible with records saved in the MARC format; the transfer of records from the MARC 21 format to the new bibliographic environment and vice versa will be enabled.

The report highlights the fact that getting adapted to the web environment and the adoption of principles and mechanisms of linked data as well as RDF as default model will offer the users a simplified access to information, while unlocking the doors of libraries to more effective storage and utilization of data not only now, but especially in the future, as the libraries will utilize the knowledge and skills of experts who are knowledgeable about the recent handling of data and software development. Thus the libraries will get themselves adapted to the present market, while saving their costs.

According to this Report the Library of Congress has specifically allotted its funds to the creation of grants for establishing national and international working groups with the aim of proposing scenarios of collaboration, revising the ontology in use and creating a new one for the description of resources.

BIBFRAME model

BIBFRAME is a conceptual model defining four entities: work, instance, authority and annotation (Library of Congress, 2012d).

Work - Work is defined as the source reflecting the conceptual base of the Resource being catalogued.

The total of eleven types (subclasses) of a work have been described, and namely the following:

Audio document, cartographical document, data set, mixed data (more types of data, yet not requiring software), video, multimedia, registered movement (graphically described, e.g. dance), registered music (graphically, not tonally), picture, text and 3D object.

Instance – an individual, material execution of work. Ten types (subclasses) of instance have been described. These are: archival object, collection, electronic document, integrating resource, manuscript, monograph, monograph having a plurality of volumes, print, series and dactylographic document.

Authority – authority is a resource reflecting key authority concepts having a defined relationship to the work and instance Four types(subclasses) of authority have been described: agent (in the sense of person, institution etc.), place, time and topics.

Annotation- annotation complements our information of other resources. Five types (subclasses) of authority have been described: envelope (reference to envelope). Information of entities (holdings), reviews, reduced text (abstract etc.), contents (in the sense of Table of contents).

Properties of entities

Each of the above entities has the following features:

a) authorised access point, which is a controlled chain of marks serving for identification, such as unique appellation or name

b) identifier, which is a controlled chain of marks serving to unambiguously identify the entity, such as UR, ISBN

c) label, which is a text chain expressing the value of the property

d) related to, which is any relation between the resources (URL Uniform Resource Locator)

Then, in addition to these common properties, the features of concrete entities get defined.

BIBFRAME format in practice

The testing of the BIBFRAME format, predominantly at the American libraries, has already begun. The Bibliographic Frame Initiative has published the following list of participating test libraries: British Library, German Library, George Washington

University Library, National Medical Library (USA), OCLC, Princeton Library and Library of Congress. The result consisted in the creation of a BIBFRAME Vocabulary, whose continuous improvement is going on, and also in the conversion of a few million data to the BIBFRAME format.

All necessary materials and applications for the conversion of data are accessible for free at the web site of the Library of Congress, Washington; the libraries may use them for their purposes. In 2015 the Library of Congress announced also the BIBFRAME Editor (available at the same place) – an editor intended for direct cataloguing into the BIBFRAME structure. The editor contains prepared templates for processing in accordance with RDA rules for monographs, musical materials, series, cartographic documents, BluRayDVD and Audio CDS. These categories always offer the choice between instance and work.

At the present moment some American libraries have their complete catalogues converted to the shape of linked data, and namely on a commercial basis by the company Zepheira that participated in the development of the BIBFRAME format. Over three million bibliographic records of libraries, such as the Boston University or the University of Manitoba, are in the process of conversion. After the conversion from the MARC XML format the program is accessible on the web, and free.

One of the pioneers in this field in Europe is the German National Library offering bibliographic records corresponding with the RDF standard since 2010. Although this library uses its own publishing model based upon the expansion of the Schema.org model, it recognizes also BIBFRAME.

1.7.3 Schema.org OCLC

OCLC began considering the possibility of presenting its data in the form of linked data at the same time as the Library of Congress, Washington, i.e. 2011. Contrary to it, however, OCLC did not begin developing its own form, but took over the Schema.org vocabulary. Schema.org is a common activity of companies backed by the Bing, Google, Yahoo! and Yandex.

Anyway, this vocabulary was gradually expanded by a version that is suitable for the libraries. The linked data model developed by OCLC defines similar entities as BIBFRAME. These are work, instance, organization and person. In comparison with BIBFRAME it is obvious that the latter has been prepared specifically for the purpose of the libraries, being based upon the existence of formats and focusing upon achieving compatibility with the search engines, so as to make data/records saved in the library databases easily searchable and accessible to users in a way that is habitual for them when they are looking for information at the present day. Schema.org has no narrow focus. Its objective is simple searching of data irrespective of origin and, accordingly, it is not bound by any rules for the description of library resources and, against BIBFRAME, it is “more flat”. An expansion of the publishing model for the environment of libraries is a subject of intensive work of an established working group W3C Schema Bib Extend Community Group.

Representatives of OCLC and of the Library of Congress in Washington have been intensively dealing with the differences and the compatibility of both schemes. Although these schemes overlap in their nuclei (the overlap concerns the expansion of Schema.org for libraries – BibExtensions), certain parts of the schemes vary

due to have been developed for different groups. (GODBY).

Schema.org in practice

As mentioned, OCLC provides its records based upon the vocabulary Schema.org, and namely using the general catalogue WorldCat as intermediary. Also the publishing model of the German National Library came to being as a modification of Schema.org.

1.8 Activities of libraries

In Sweden the problems of linked data are dealt with by the Swedish National Library that is in charge of the global Swedish catalogue LIBRIS. This catalogue opens access to records from 165 Swedish libraries. Due to the independent activity of the Swedish libraries over 6 million bibliographic records are available thanks to linked data of current browsers, which began as early as 2008. (SÖDERBACK) When mapping the options of the MARC format on RDF, the librarians of the Swedish National Library were guided by a very simple idea:”…it is better to bring something immediately, rather than sticking to the detail and waiting for perfection”. (MALMSTEN)

Finland is another Scandinavian country that wishes to open up its catalogues to the world. The Finnish National Library has begun thorough mapping of its records. Its activity is supported by the decision of the Finnish government of 2011, ordering its institutions to enable access to the public information resources. For this purpose the Open Data Programme was declared in 2013. At the present day the Finnish National Library is still at the beginning of this project (2015-2017) that should result both in data in the form of open data, and in complete documentation for the libraries.

Although the local librarians see the future in BIBFRAME, the decision was passed in support to their own structure. (HYVÖNEN)

Thanks to the pressure from the side of the government also the British librarians have commenced their activities. They started tackling the open data publishing model in 2009. The British National Bibliography counting well over 3 million records was chosen for this work and made available in June 2011. (DELIOT)

There is one separate project under the name of Linked data for Libraries, LD4L, among the American university libraries. This joint project of the Cornell University Library, Harvard Library Innovation Lab and the Stanford University Libraries has received the support of a two years´ grant of the Mellon Foundation, amounting to one million USD. The project focuses upon the creation of a model for the publishing of structured data that will fully reflect the special needs of the university libraries, in spite of being based upon the general BIBFRAME. (LD4L)

2 The future

The object of this article consists in offering a set of information about the topical trends in the field of exchangeable formats of bibliographic data and a prediction of possible development. The above makes it obvious that the life cycle of bibliographic data changes, or has already undergone a conversion. In spite of the fact that the librarians still use predominantly MARC 21 for cataloguing and the libraries distribute data in this format among themselves, the conversion to the linked data structure has been added to the end of this chain. Thanks to this structure the valuable information created by the librarians can finally get out of the library catalogues and databases, often called silos in the current language, and namely to the free web – where the final users (readers) can simply find the desired item as early as in their first attempt when searching for information in a Google type browser.

The author of this contribution has chosen one of the possible quantitative methods for estimating the future development, and namely an enquiry by way of an electronic questionnaire. A benefit of this method resides in addressing foreign experts irrespective of their stay.

This opinion poll in form of asking open questions is the basis of the Delf-method enabling to ascertain the opinions of a group of experts independently of each other.5 This survey was carried out on two research samples. Each of them consisted of a group of professionals: the first was the IFLA Cataloguing Section Standing Committee, the second were members of the e-mail conference of BIBFRAME. The questionnaire survey was carried out in January and February 2016.

2.1 Questionnaire

A Google form with free access was used for my questionnaire and the reference to the same was distributed by electronic post. My philosophy for preparing the questionnaire was based upon the following hypothesis: the librarian community tends to consider the MARC 21 format as outdated and frequent calls are heard that it ought to be replaced. At the same time it is a topical trend to publish data on the web, including librarian bibliographic information, while making use of linked data. However, the above mentioned text of the study shows that libraries undertaking this journey have taken up different paths. That is why my questionnaire contained three open questions:

How soon will be MARC 21 replaced with a different type of bibliographic data format (in your country)?

Will be linked data format used for the exchange of bibliographic data?

Will there be one (leading) structure of linked data or many versions developed by libraries?

2.2. Survey sample

As indicated, the questionnaire was sent to two groups of the professional public. The first group (sample A) consisted of fifteen members of the Standing Committee of the IFLA cataloguing section, further by corresponding members, the chairman, the secretary and the information coordinator. In sum eighteen persons were addressed.

These members are experts from all over the world (at the present moment specialists represent Denmark, Vatican, Egypt, Argentina, France, the Czech Republic and others). The cataloguing section of IFLA handles the cataloguing issue in the broadest sense, suggesting and developing cataloguing rules, directives and standards. In pursuing its aims, it closely collaborates with the International Organization for Standardization ISO. Thus the Cataloguing section of IFLA can exert direct influence upon the codification of standards and, accordingly, the attitudes and opinions of its members are important from the viewpoint of forecasting the future.

The second group (sample B) were the registered addressees of the public e-mail BIBFRAME conference Listserv whose administrator is the Library of Congress in Washington. The Conference has 1744 listed members.

2.3 Results

Answers were received from 12 persons of the survey sample A and from 30 persons from sample B.

Question No 1 - How soon will be MARC 21 replaced with a different type of bibliographic data format (in your country)?

The resulting answers show that most respondents obviously expect a change (35 persons, which makes 90%).

Tab. 1 Reply to the question whether format MARC 21 will be replaced with a different type of bibliographic data format

 

sample A

sample B

total

%

yes

8

27

35

90

no

4

0

4

10

Sample A

38% respondents from sample A believe that the replacement of MARC 21 with another type of format is an ongoing process already. Further 37% respondents expect its replacement within 10 years.

Tab. 2 Reply of respondents from sample A to the question how long it may take to replace the MARC format with another type of bibliographic data format

 

sample A

%

in process already

3

38

5 years

2

25

10 years

1

12

later

0

0

different

2

25

total

8

100

Sample B

15% respondents from the surveyed sample B think that the replacement of MARC 21 with another type of format is an ongoing process already. Further 67% respondents expect its replacement within 10 years.

Tab. 3 Reply of respondents from sample B to the question how long it may take to replace the MARC format with another type of bibliographic data format.

 

sample B

%

in process already

4

15

5 years

10

37

10 years

8

30

later

4

15

different

1

3

total

27

100

One of the respondents complemented the answer with the information that in her home library (Library of Congress in Washington) the linked data structure is used in testing regime also for the primary creation of records (cataloguing). The total of four respondents condition the transition by an impulse of one of the leading libraries (Library of Congress in Washington or British Library).

Question No 2 – Will be the linked data structure used for the exchange of bibliographic data?

Only 5 respondents gave a strictly negative answer to this question. Further two respondents (from sample B) replied no, linked data being intended for publishing information, and not for being exchanged in this form. The remaining (35) gave a positive response and two of them added that there would always be an option in choosing from different formats. One respondent said that he actually would not know. The responses from sample A enable us to suppose the activity of IFLA in support of the application of the linked data structure also for the exchange of bibliographic data.

Tab. 4 Response to the question whether the linked data structure will be used for the exchange of bibliographic data

 

sample A

sample B

%

yes

11

24

83

no

0

5

12

don’t know

0

1

2,5

different

1

0

2,5

total

12

30

100


Question No 3 – Will there be one (leading) structure of linked data, or different variants developed by libraries?

There was mostly agreement of the respondents to this question. Only eight of them (19%) replied unambiguously that there would be a single model of linked data (in the table as YES), and we may summarize from the received answers that BIBFRAME prevails. One respondent did not know and one did not answer. The remaining participants (34 respondents, i.e.78%) shared the opinion that there would be a number of different versions. Most respondents complemented the response with their opinions that are very interesting for the purpose of this paper; the plurality of respondents believe that BIBFRAME may win, but it can exist in diversified local versions or modifications according to its actual target. From the responses of sample A, i.e. persons responsible for the codification of standards, only two participants replied that one single standard can be expected. Nevertheless, it seems obvious at the present day that the IFLA association may not be likely to apply pressure for achieving unanimity.

Tab.5 Response to the question whether there will be one (leading) linked data structure

 

sample A

sample B

%

yes

2

6

19

no

9

23

76

don’t know

0

1

2,5

different

1

0

2,5

total

12

30

100

2.4 Conclusion of the enquiry

The received responses to the questionnaire enable the deduction of an answer to the question whether MARC 21 is really being replaced or will be replaced by another format of bibliographic data. Most respondents (90%) agree that it either is or is going to be replaced not later than within 10 years. 83% respondents think that MARC 21 will be replaced with the linked data structure. Only two of the participants draw the attention to the fact of the linked data structure being intended for publishing bibliographic information, rather than exchanging the same. Most respondents (76%) believe that diversified structural variants can be expected.

It is interesting to observe the responses of the surveyed sample A, i.e. those of the members of the Standing committee of the IFLA Cataloguing Section, who can exert direct influence upon the creation of standards in the domain of cataloguing. Considering the fact that the majority of them see the linked data structure as the candidate replacing the MARC 21 format for exchanging bibliographic data, the activities in this direction can be anticipated. On the other hand, the respondents do not agree as to possible unification of structure and, accordingly, pressure may not be exerted upon structural unification, at least not for the time being.

3 Summary

Once again, we ought to remind ourselves for what purpose the exchangeable formats of bibliographic data serve in the librarianship: for recording and transferring (exchanging) between bibliographic agencies and various institutions. (KTD) Anyway, it seems to be obvious at the present moment already that these two functions may get separated, or at least some function seems to split away for the publishing of bibliographic data on the web. Although the librarian community has been discussing the obsolete state of the MARC format quite at length, and in spite of the results of the above enquiry, this structure may be retained, at least for the current storage (cataloguing) of data and their exchange between the library systems. This applies notwithstanding the fact that the linked data structure is used also for cataloguing purposes in the testing regime by the Library of Congress in Washington. The frequently mentioned rationale is the unpreparedness of the existing library systems6 as well as the lack of funds for executing the change.

Most libraries observing the development on this field, irrespective of whether they are interested as consumers or deal with the development themselves, expect using the linked data function solely for the presentation of data on the web (i.e. not for exchanging bibliographic records between the libraries) and for opening up their funds to the users directly, using the search interface of global browsers. Some conversion programmes with free access are already in use for this purpose, or also commercial conversion systems can be chosen that are offered by companies having participated in the respective development. The French National Library organized a survey in 2014 in the result of which it was able to determine that as soon as it had made its catalogues accessible to browsers, full 80% of all enquiries were primarily implemented this way, not via OPAC. In addition to that it was found during the mentioned enquiry that these users had mostly no idea of the web site of the library catalogue in question. (ADAMICH,2015b)

The leading models of linked data are the structure of the bibliographic frame BIBFRAME that has been developed under the auspices of the Library of Congress in Washington, and Schema.org that has been implemented by OCLC for its catalogue WorldCat. An expansion of Schema.org for the domain of libraries is underway. Criticisms relating to slow process and further reasons, however, lead further American and European libraries to their own experimentation in this field, resulting in gradual establishing of various local versions. Since the interested institutions can be expected to collaborate, we may anticipate desirable compatibility.

Reinhold Heuvelmann from the German National Library predicts the extinction of the last library systems supporting the structure of the MARC format in 2060. But before that an article will be published in 2047. And its title? BIBFRAME must die7. (HEUVELMANN).

The editors did not intervene in the method of citation in this article.

Poznámky pod čarou

1 MARC – Machine Readable Cataloguing, format created in its first version at the Library of

Congress in Washington with the purpose of providing bibliographic data in a machine

readable shape, to be distributed on magnetic tape to American libraries, for enabling them

to make their own print-outs to cards. The name MARC 21 means MARC for the 21st

century.

2 Technological basis for exchanging data in the web environment as an application of XML

3Internet protocol designed for interchanging hypertext documents in the web environment

4XML – Extensible Markup Language, a type of language that is suitable for exchanging

data among applications and for publishing documents. MARC/XML is an application of a

XML scheme that was created by the Library of Congress in Washington for the purpose of

exchanging bibliographic data between systems or their publishing. The conversion

between the format MARC 21 and MARC/XML is free from any loss and can be

implemented with the free accessible MARC Tool Kit that is available on the Library of

Congress web (http://www.loc.gov/standard/marcxml/).

5This article makes use of the responses received in the first round of asking questions. For

the purpose of the dissertation thesis the ascertained results will be submitted to the

respondents of the survey sample A (sample B having been anonymous) with the request

to clarify or correct them, as case be. The received answers ought to be gradually

correlated. An agreement will be considered as a prediction of future development.

6The analysis of the preparedness of the library systems is part of the envisaged

dissertation thesis. In the opinion of the author the implemented survey shows the library

systems as unprepared for the change at the moment.

7The title is a reference to the famous article written by R. Tennanta “MARC must die”.

Dec 21, 2016
Filed under: