Data sets for historical events

Special data sets – prints

Zeitschriftendatenbank

The Union Catalogue of Serials (ZDB), which is jointly maintained by the Deutsche Nationalbibliothek (DNB) and the Staatsbibliothek zu Berlin (SBB), contains records of journals, newspapers, book series and other serial publications from all countries, in all languages, from all periods, in printed, electronic or digitized form. The bibliographic records are supplemented by the corresponding holding records of libraries in Germany and Austria.

The current search interface of the ZDB Catalogue provides various search functions, including the visualization of title relations, a timeline for title histories and changes, a map of collections and a graphic chart of collections.

The ZDB Catalogue at a glance:

The ZDB provides various interfaces and data services on the basis of the centrally recorded data.

Interfaces

OAI-PMH

The OAI Protocol for Metadata Harvesting (OAI-PMH) is an XML-based protocol for querying and transferring metadata between a data provider and a service provider who provides customised research services based on the queried data.

The ZDB provides an OAI interface for querying bibliographic and holding records . The scope of the delivered data can be determined by selecting specific time intervals. Data formats available include MARC21 and OAI DC.

Access to the OAI interface requires you to register. For more information on OAI and registration, please visit the DNB website at http://www.dnb.de/oai or contact the DNB Interface Service via schnittstellen-service@dnb.de.

SRU

Search/Retrieve via URL (SRU) is a protocol used to query bibliographic databases using the HTTP protocol and is further developed by the Library of Congress. ZDB has  replaced the Z39.50 protocol with the SRU protocol and thus meets the requirements of modern web development.

SRU queries are written in the Common / Contextual Query Language (CQL) and sent to the SRU server via GET/POST. The response is returned as XML text.

SRU interfaces

The ZDB offers two SRU interfaces:

DatABASE CONTENT Base-URL Explain-Operation
ZDB Catalogue Bibliographic Records, Holding Records http://services.dnb.de/sru/zdb ZDB Explain-Operation
ZDB Address Database / ISIL and Library Identifier Index Address Data, Library Identifier, ISIL http://services.dnb.de/sru/bib Library ID Index Explain Operation

Both interfaces are currently still based on SRU Version 1.1 and CQL: Version 1.1, Level 2.

Formats

The following formats are available:

FORMAT DESCRIPTION EXAMPLE SCOPE / CONTENTS
SCHEME
MARC21-xml Description of MARC21 Format Example XML variant of MARC21
Title data or ISIL / address data
MARCXML Scheme
MARC21plus-1-xml Description of MARC21 Format for local Data Example XML variant of MARC21 with title and local data MARCXML Scheme
oai_dc Description of Dublin Core Format Example Selection of Dublin Core Elements
Title data and ISIL / address data
OAI Dublin Core Scheme
PicaPlus-xml Description of ISIL Format / Address Data Example XML version of Pica Plus
ISIL / Address data
PicaPlus-XML Scheme
RDF/XML RDF Representation of Bibliographic Data Example RDF/XML Serialization of Title Data RDF/XML Syntax Specification

Linked Open Data

The ZDB offers you access to your title data as Linked Open Data.

Modeling of the title data is based on the recommendations for the RDF representation of bibliographic data of theTitle Data group of DINI-AG KIM (see Vocabularies Used).

Only the most important data of each title aredisplayed However, the scope of the fields converted into RDF will be expanded over time. The service provided here should therefore be seen as an intermediate stage of the data modelling currently under development.

Service and data model

Bibliographic records are encoded in the Resource Description Framework (RDF). As RDF serialization the records are available as RDF/XML, Terse RDF Triple Language (Turtle) and JSON-LD.

The ZDB Linked Data Service has been developed with regard to the W3C Best Practices (Cool URIs for the Semantic Web) and is based on URIs with 303 Redirect and Content Negotiation.

Using content negotiation, the ZDB linked data service attempts to find the appropriate representation of the data for the respective client and returns a corresponding content type.

Used Vocabularies

Bibliographic data in the ZDB  are structured  according to the recommendations for the RDF representation of bibliographic data of the AG KIM group Title data DINI. The vocabularies and terms currently used are described using JSON-LD context objects:

JSON-LD context object for ZDB title data according to DINI-KIM recommendation
JSON-LD context object for ZDB title data according to DINI-KIM recommendation (with content type application/ld+json)

RDF dump

The ZDB data are provided as an RDF dump in the serializations RDF/XML, Turtle and JSON-LD for download on the download page for open data of the German National Library.

HDT

The ZDB data are also available as HDT files. HDT (Header, Dictionary, Triples) is a compact binary serialization format for RDF that compresses large datasets to save disk space. It is possible to search directly in a compressed dataset. This makes it an ideal format for storing and sharing RDF datasets on the web.

Changes and updates to the Linked Data Service will be announced via the DNB mailing list: http://lists.dnb.de/mailman/listinfo/lds.

Conditions for Use

All bibliographic data and a large part of the holding records are available under the Creative Commons Zero 1.0 license. Please refer to the data licensing information given by ZDB.

We would like to point out that our permission to use the interfaces is only valid under the prerequisite that the hosting function of the German National Library, i.e. is not impaired by any problems created by downloading data.

Contact

Hans-Jörg Lieder, Carsten Klee

Titelbild / Frontispiece Stabi

Bibliographic Data from StaBiKat

Introduction

Our online catalogue StaBiKat includes the metadata of the complete printed and digital collections of Staatsbibliothek zu Berlin from the publication years 1500 to the present. Currently there are about 14 mio. searchable records.

StaBiKat data (excerpt) – https://zenodo.org/record/2590752

Are you interested in working with basic bibliographic data from our online catalogue StaBiKat? Sets of records organized along language families are already available. The data sets consist of the most important metadata including PPN (catalogue identifier), author, title, place/country of publication, publisher, year of publication and language code. The data sets do not include the full metadata, nor do they represent the content of the catalogue in full. Moreover, you should take note of the date of the last update. If you require more up-to-date data, you can use the following scripts to create your own data sets.

Which information is missing here?

• Records without a language code (slightly less than half of all SBB records)
• Shelf marks and location identifiers (e.g. to identify items lost in the war),
• existence of digital versions (including their PURLs)
• data sets selected on the basis of subject criteria or year of publication

These data sets are provided by the Gemeinsamer Bibliotheksverbund (GBV), the library network of which Stiftung Preussischer Kulturbesitz has been a member for over 20 years.

Interfaces

StaBiKat does not provide direct support for interfaces to export large data quantities. However, for specific queries you can use the SRU or UnAPI interfaces of the GBV network. You may have to get in touch with the network office of GBV.

SRU – http://sru.k10plus.de/opac-de-1

SRU, which is used here in the 1.1 version, is an http-based protocol for machine-based bibliographic data queries. You can use this interface to import data for your catalogues, subject gateways or digitizing your objects.

The retrieval language used is Contextual Query Language. You can use the StabiKat SRU interface to run concise queries yielding a limited set of results. Metadata provided here come in the Dublin Core (DC, v 1.1) and MODS (v 3.4) formats.

Search syntax and indexes: http://sru.k10plus.de/opac-de-1: http://sru.k10plus.de/opac-de-1

Examples for queries:

• A maximum of 10 titles in StaBiKat that contain the words „pupils“ and „Edict“, in MODS Format
http://sru.k10plus.de/opac-de-1?version=1.1&operation=searchRetrieve&query=pica.xtit=pupillen+edict&maximumRecords=10&recordSchema=mods
• Person search for Konrad Adenauer through the full StaBiKat database, output in Dublin Core, with a maximum of 300 results
http://sru.k10plus.de/opac-de-1?version=1.1&operation=searchRetrieve&query=pica.xprs=adenauer,konrad&maximumRecords=300&recordSchema=dc

SRU – K10-PLUS Network Catalogue

The corresponding queries on the level of the GBV network are:
http://sru.k10plus.de/gvk7?version=1.1&operation=searchRetrieve&query=pica.tit=pupillen+edict&maximumRecords=50&recordSchema=mods
and
http://sru.k10plus.de/gvk7?version=1.1&operation=searchRetrieve&query=pica.prs=adenauer,konrad&maximumRecords=300&recordSchema=dc

unAPI

UnAPI provides a straightforward web-based method to retrieve individual record in different formats. The unAPI interface does not enable searches through whole data collections but only provides individual records referenced with an identifier. Each query therefore has to include an unambiguous identitfier for the respective record and the metadata format required (cp. https://wiki.k10plus.de/display/K10PLUS/UnAPI, 1. Abs.)

If you want to download individual records whose PPN you know, you can use the unAPI interface of StaBiKat and GBV network catalogue as follows:

StaBiKat – http://unapi.k10plus.de/?id=opac-de-1

Syntax:
http://unapi.k10plus.de/?id=opac-de-1:ppn:##########&format=dc

Example:
http://unapi.k10plus.de/?id=opac-de-1:ppn:1000127265&format=dc

Network catalogue unAPI – http://unapi.k10plus.de/

Syntax:
http://unapi.k10plus.de/?id=gvk:ppn:##########&format=mods

Example:
http://unapi.k10plus.de/?id=gvk:ppn:178293199&format=mods

Just add the PPN in place of ########## and select the output format (e.g. „MODS“). GBV alternatively provides the possibility to retrieve data in the Pica, Dublin Core and MARC formats. Please note that the unAPI interface of StaBiKat only gives results in the Dublin Core and MODS formats.

Conditions of Use

SBB pursues to an Open Data Policy and provides its metadata for free under the CC0 licence. The conditions of use fort he interface service are defined by GBV.

Contact

Andrea Jacobs

ZEFYS

Introduction

The ZEitungsinFormationssYStem ZEFYS offers access to the digitized historical newspapers of Staatsbibliothek zu Berlin.

Currently ZEFYS provides access to 276.015 issues of 193 historical newspapers from Germany, and of German-language newspapers in foreign countries.

Interfaces

For legal reasons, the APIs listed here are only available for the public domain titles in ZEFYS. For the contents of the “DDR Presse” portal we can unfortunately not provide direct access to the data.

Retrieval of content, images and full-text, for digitised newspapers is supported via the International Image Interoperability Framework (IIIF) protocol. An increasing number of free clients and libraries for IIIF in numerous programming languages are available on the web.

Currently, digitised newspaper images and metadata can be retrieved by requests following the schema:
http://content.staatsbibliothek-berlin.de/zefys/SNP{ZDB-ID}-{YYYYMMDD}-{Issue}-{Page}-{Article}-{Version}

The ZDB-ID is a unique identifier for every newspaper title and can be found either within the ZEFYS newspaper portal or directly from the ZDB.

Next, a date of issue needs to be specified in the YYYYMMDD format, e.g. 18900101 for the issue published on January 1st, 1890. If you want to see which date ranges of a specific title have already been digitised, please refer to the ZEFYS newspaper portal.

To retrieve the scanned images for the newspaper, further information needs to be specified in the URL, such as the addition of /full/{width in pixel},/0/default.jpg with width in pixel can be chosen freely and the height will be calculated, e.g.
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/1200,/0/default.jpg
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/250,/0/default.jpg

The IIIF format allows more image manipulations via URL. Besides changing the size of the image it is possible to view a section of the image or turn the image. In the following example a 300 x 300 pixel sized section turned 90° will be delivered.
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/1000,1000,300,300/full/90/default.png

It is also possible to retrieve the original TIFF images via IIIF by replacing the width in pixel with full and specifying default.tif instead of default.jpg in the URL as follows:
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/full/0/default.tif

By combining the page number 0 with the ending .xml in the URL, the metadata METS document for each newspaper title can be obtained, e.g.
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-0-0-0.xml

Further working examples:
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/full/0/default.tif -> TIF, Seite 1
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/1200,/0/default.jpg -> JPG, Seite 1
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0.pdf -> PDF, Seite 1
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-0-0-0.pdf -> PDF, alle Seiten
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0.xml -> ALTO, Seite 1
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-0-0-0.xml -> METS

Full texts

For the project Amtspresse Preußens full texts in different formats can be retrieved.

For the newspaper Teltower Kreisblatt the full texts are delivered in the ALTO format. Compared to the delivery of the METS file its necessary to add in the letter A for the issue
and to use the correct page number in the URL. Then instead of the METS data the ORC data in the ALTO format will be delivered for every page:
https://content.staatsbibliothek-berlin.de/zefys/SNP25128437-18580116-A-1-0-0.xml for page 1,
https://content.staatsbibliothek-berlin.de/zefys/SNP25128437-18580116-A-2-0-0.xml for page 2 etc.

For the other newspapers with fulltext, Provinzial-Correspondenz and Neueste Mittheilungen, the data are saved in a free XML format.
Here the issue has to be specified with the letter F and the page number with 0, because the whole full text is contained in one file.
Examples for the delivery of the OCR data are for the Provinzial-Correspondenz:
https://content.staatsbibliothek-berlin.de/zefys/SNP9838247-18770117-F-0-0-0.xml
and for the Neueste Mittheilungen:
https://content.staatsbibliothek-berlin.de/zefys/SNP11614109-18930721-F-0-0-0.xml

Conditions of Use

Contact

Kalliope Union Catalog

Kalliope is a Union Catalog for collections of personal papers, manuscripts, and publishers’ archives and the National Information System for these material types.

More than 19,300 holdings from more than 950 institutions with a total of more than three million individual items are currently indexed online. Kalliope contains metadata of correspondence archives, manuscripts, private and professional document files, diaries, family albums, lecture notes, photographs, posters, films, music, but also hair curls, … by and about 600,000 people and 100,000 organizations.

Interfaces

SRU – http://kalliope-verbund.info/sru?version=1.2

SRU, here used in the 1.2 version, is an HTTP-based protocol for the automatic retrieval of bibliographic data. By using this interface you can use the data for your catalogues, your subject portals, or tfor digitizing your objects.

The retrieval language used is the Contextual Query Language; it is also used for the expert search. Data from Kalliope are  are vailable in the formats Dublin Core (DC, v. 1.1) and Metadata Object Description Schema (MODS, v. 3.4).

Query the SRU interface

Indexes can be queried directly (see index list), e.g:

Documentation of data formats

The overview of the elements for MODS and Dublin Core can be found here  (PDF).

The URL in ./recordIdentifier/@url (MODS format) is a persistent URL. It consists of the domain name http://kalliope-verbund.info/ + record number: http://kalliope-verbund.info/{ID}, e.g. http://kalliope-verbund.info/DE-611-HS-2321418.

Further examples

Conditions for Use

The vast majority of the data are licensed under CC BY-SA. Only licenses that differ are specified in the individual data record.

Contact

Gerhard Müller

Digitised Collections

Introduction

Probably you know our Digitised Collections where currently (November 2020) roughly 175,000 digitized objects from the archives of the SBB present online? With a variety of features (for features that are in the current development process have a look at the Beta-Version of the digitized collections) we hope that it is easy and efficient for you to search and browse our digitized objects.

But if you want to get these data to process it or integrate it in your application? For this purpose we provide different technical interfaces (APIs).

 

Interfaces

Currently the Digitised Collections provide two interfaces, OAI-PMH und iiif.

1. OAI-PMH – https://oai.sbb.berlin

Retrieval of metadata for objects in the digitised collections is established by use of the The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard. A wide range of client applications for OAI-PMH in numerous programming languages are freely available on the web.

The base URL for the OAI-PMH endpoint of the digitised collections of the SBB is
https://oai.sbb.berlin/

Using the 6 verbs provided by OAI-PMH, requests such as the following can be generated

The SBB implements DublinCore (DC) for basic bibliographic metadata and METS for all metadata about the contents and structure of a digital object.

By combination of OAI-PMH verbs and the DC-Metadata, more specific requests can be formulated such as

The response contains a unique identifier for each digital oject, the PPN, e.g. oai:digital.staatsbibliothek-berlin.de:PPN867445300. Using the PPN, additional information about a digital object can be retrieved
https://oai.sbb.berlin/?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300

By changing the metadata-prefix to mets, the complete METS metadata record containing all references to any related files (images, OCR) can be retrieved
https://oai.sbb.berlin/?verb=GetRecord&metadataPrefix=mets&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300

The METS file contains a section <fileSec> which holds child elements of the type <fileGrp> which contain references to various files that belong to the digital object, typically images in either JPG or PNG format.
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/full/0/default.jpg
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/full/0/default.png
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/full/0/default.tif

2. IIIF

Retrieval of content (images and full-text) the digitised collections is supported via the International Image Interoperability Framework (IIIF) protocol. Also here a growing number of free clients and libraries for IIIF in numerous programming languages are available on the web.

Currently, digitised images, metadata and fulltext data can be retrieved by requests following this schema:
https://content.staatsbibliothek-berlin.de/dc/{PPN}-{Page}

The PPN is an unique ID for every work that can be found in the digitised collections.

To get scanned images for a specific object further parameters, following the IIIF protocol have to provided in the URL:
/full/{width in pixel},/0/default.jpg wobei width in pixel die Höhe automatisch anpasst, z.B.
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/1200,/0/default.jpg
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/800,/0/default.jpg
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/250,/0/default.jpg

The IIIF protocol permits more image manipulations via URL, e.g. cutting, resizing and rotating the image. In the following example contains an image detail, sized 300 x 300 px, rotated by 90°.
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/100,100,300,300/full/90/default.png

Furtermore its possible to get the orinial TIFF image. Just change the default.jpg to default.tif in the URL:
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/full/0/default.tif

For even more possibilties of manipulation of singe images have a closer look to the IIIF Image API 2.1.1 thats implemented by the content server.

Additionally the content server can deliver more data for specific works. An overview of the addional functions can be found in the NGCS routes documentation. This includes, for example, dynamic highlighting on the pictures. The highlighted areas are defined in the same way as sections of the image. As a further parameter a color can be specified as a hex code: https://content.staatsbibliothek-berlin.de/dc/PPN646236717-00000011/full/1200,/0/default.jpg?highlight=55,100,120,100|1150,460,110,80&highlightColor=ff0000

As specified in the IIIF Presentation API the IIIF manifest file of the object can be retrieved with the URL:
https://content.staatsbibliothek-berlin.de/dc/PPN867445300/manifest

This manifest can be loaded in every IIIF viewer, e.g. in the Mirador viewer, hosted by the SBB:
https://mirador.staatsbibliothek-berlin.de/?manifest=https://content.staatsbibliothek-berlin.de/dc/PPN897443810/manifest&manifest=https://content.staatsbibliothek-berlin.de/PPN876457189/manifestNext to the manifest the metadata can als be retrieved in the METS/MODS format:
https://content.staatsbibliothek-berlin.de/dc/PPN867445300.mets.xml

Fulltexts

With adding the page number to the URL and the suffix ocr.xml the OCR file in ALTO format per page will delivered:
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-0009.ocr.xml for page 9,
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-0010.ocr.xml for page 10 etc.

The OCR files also can loaded completely packed in a ZIP file.
https://content.staatsbibliothek-berlin.de/dc/PPN867445300.ocr.zip

Conditions of Use

SBB pursues an Open Data Policy and endeavours to make all digitised works published before 1920 available to the public under a Public Domain Mark 1.0 licence. In exceptional cases and for works published later than 1920, different licences may be used.

You can recognize the valid license for an object in the Digitized Collections if you display the complete bibliographic information about the object.

 

and there scroll down to the point license / rights info

Of course, you can also find this information in the metadata in METS format under <mods:accessCondition>

Special data sets

For the Hackathon Coding Gender: Women In Cultural Data, which took place at the end of August 2019, thematic datasets were provided, which are described and listed here.

Contact