Frequently Asked Questions

Q1: Why is Harvard University making its library catalog metadata publicly available?

A: Open access to data and metadata are cornerstone values of the Harvard Library.  From the Open Collections Program to harvestable metadata from DASH (Harvard’s open access scholarly repository) and a range of digital collections, Harvard libraries have long been working to open collections and metadata for public use and reuse. With growing interest in and benefits from integrating library information into the web, the time seems right to support innovation in this space with as much metadata as we can.  

Q2: Most libraries make their catalogs available online. How is this different?

A:  Library catalogs make use of metadata to allow for online searching of information about library collections, but catalogs generally do not make the metadata itself available for harvesting so that it can be reused in innovative ways.   Library metadata is the foundational information such as author, title, publisher and subject about the books, journals, and many other forms of knowledge traditionally collected by libraries. These are important cultural objects, but without libraries making the information about them available in a reliable, reusable form, it has been hard for developers to create applications that make full use of them.  Harvard hopes not only that the release of its catalog metadata will enrich the Web ecosystem, but also that more institutions will be encouraged to release their metadata.

Q3:  How many records will be available?

A:  The initial dataset includes over 12 million records for items, such as books, journals, manuscripts, electronic resources, archival collections, audio, video, scores, and other formats from Harvard's dozens of libraries.

Q4: How is Harvard making this metadata available?

A: You can download the MARC21 records from Harvard. MARC21 is the standard format for encoding bibliographic information. In addition, developers can get programmatic access to that information through an API offered by the Digital Public Library of America (DPLA) beta platform. An API (application programming interface) enables a computer program to request information from a site. So, if you want to write a program that, for example, retrieves information about books classified both in science and in cooking, your program could make that request via the API.

Q5: What is Harvard's relationship with the DPLA? 

A:  The DPLA is an independent national organization, with a diverse steering committee and contributors from all across the nation. Harvard’s metadata is being used in the DPLA beta platform. 

Q6:  Is Harvard placing any restrictions on use of the data?

A:  No, Harvard is not imposing any limitations on use of the data.  However, we  are requesting that users comply with a simple set of Harvard Library community norms.  These norms request attribution and that if others improve this data, they make those improvements equally freely available.  In addition, for data that originated in WorldCat, at OCLC’s request, we are asking users to observe the WorldCat community norms.  We believe that observing these community norms will help promote good practices, foster trust among partners, and encourage growth of the open metadata community.

Q7: What is Creative Commons Zero (CC0), and why is Harvard releasing the metadata under CC0?

A:  CC0 is a public domain designation developed by the Creative Commons for use when a person wants to relinquish all copyright and related rights the person has in a work.  More information about CC0 1.0 is available at With the CC0 public domain designation, Harvard waives any copyright and related rights it holds in the metadata.  We believe that this will help foster wide use and yield developments that will benefit the library community and the public.

Q8: How big is the downloadable file?

A: The initial set of MARC21 records consists of a single file of approximately 3.1 gigabytes.

Q9:  Is it possible to get selected subsets of records from the database?

A: Only the full set is available as a downloadable file.  Applications that incorporate the metadata, including the DPLA beta platform, can provide additional functionality.

Q10: What are potential uses for the downloadable file of MARC21 records and API access?

A: You'd want the downloadable file of MARC21 records if you want to do some intense processing of the data, or if you want to integrate those records into other data you already have. You might use the API if your site or application needs to pull up information about items as part of a service it’s providing to users. For example, if you have a site that lets people review textbooks, you might use the API to fetch the page count of a book as a user begins a review.

Q11: Will the data be updated?

A: Yes, Harvard Library plans to update the records in this dataset on a weekly basis.

Q12:  Have other libraries released records?

A:  Some have. For example, the British Library has released 3 million records, Cologne libraries have released 5.4M and the University of Cambridge has released 3.6M. OCLC has released 8 million bibliographic records as part of the OhioLINK–OCLC Collection and Circulation Analysis Project.

Rev. April 23, 2012