Tag Archive for: api

ZEFYS

Introduction

The ZEitungsinFormationssYStem ZEFYS offers access to the digitized historical newspapers of Staatsbibliothek zu Berlin.

Currently ZEFYS provides access to 276.015 issues of 193 historical newspapers from Germany, and of German-language newspapers in foreign countries.

Interfaces

For legal reasons, the APIs listed here are only available for the public domain titles in ZEFYS. For the contents of the “DDR Presse” portal we can unfortunately not provide direct access to the data.

Retrieval of content, images and full-text, for digitised newspapers is supported via the International Image Interoperability Framework (IIIF) protocol. An increasing number of free clients and libraries for IIIF in numerous programming languages are available on the web.

Currently, digitised newspaper images and metadata can be retrieved by requests following the schema:
http://content.staatsbibliothek-berlin.de/zefys/SNP{ZDB-ID}-{YYYYMMDD}-{Issue}-{Page}-{Article}-{Version}

The ZDB-ID is a unique identifier for every newspaper title and can be found either within the ZEFYS newspaper portal or directly from the ZDB.

Next, a date of issue needs to be specified in the YYYYMMDD format, e.g. 18900101 for the issue published on January 1st, 1890. If you want to see which date ranges of a specific title have already been digitised, please refer to the ZEFYS newspaper portal.

To retrieve the scanned images for the newspaper, further information needs to be specified in the URL, such as the addition of /full/{width in pixel},/0/default.jpg with width in pixel can be chosen freely and the height will be calculated, e.g.
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/1200,/0/default.jpg
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/250,/0/default.jpg

The IIIF format allows more image manipulations via URL. Besides changing the size of the image it is possible to view a section of the image or turn the image. In the following example a 300 x 300 pixel sized section turned 90° will be delivered.
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/1000,1000,300,300/full/90/default.png

It is also possible to retrieve the original TIFF images via IIIF by replacing the width in pixel with full and specifying default.tif instead of default.jpg in the URL as follows:
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/full/0/default.tif

By combining the page number 0 with the ending .xml in the URL, the metadata METS document for each newspaper title can be obtained, e.g.
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-0-0-0.xml

Further working examples:
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/full/0/default.tif -> TIF, Seite 1
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0/full/1200,/0/default.jpg -> JPG, Seite 1
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0.pdf -> PDF, Seite 1
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-0-0-0.pdf -> PDF, alle Seiten
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-1-0-0.xml -> ALTO, Seite 1
https://content.staatsbibliothek-berlin.de/zefys/SNP27974534-19010712-0-0-0-0.xml -> METS

Full texts

For the project Amtspresse Preußens full texts in different formats can be retrieved.

For the newspaper Teltower Kreisblatt the full texts are delivered in the ALTO format. Compared to the delivery of the METS file its necessary to add in the letter A for the issue
and to use the correct page number in the URL. Then instead of the METS data the ORC data in the ALTO format will be delivered for every page:
https://content.staatsbibliothek-berlin.de/zefys/SNP25128437-18580116-A-1-0-0.xml for page 1,
https://content.staatsbibliothek-berlin.de/zefys/SNP25128437-18580116-A-2-0-0.xml for page 2 etc.

For the other newspapers with fulltext, Provinzial-Correspondenz and Neueste Mittheilungen, the data are saved in a free XML format.
Here the issue has to be specified with the letter F and the page number with 0, because the whole full text is contained in one file.
Examples for the delivery of the OCR data are for the Provinzial-Correspondenz:
https://content.staatsbibliothek-berlin.de/zefys/SNP9838247-18770117-F-0-0-0.xml
and for the Neueste Mittheilungen:
https://content.staatsbibliothek-berlin.de/zefys/SNP11614109-18930721-F-0-0-0.xml

Conditions of Use

Contact

Digitised Collections

Introduction

Probably you know our Digitised Collections where currently (November 2020) roughly 175,000 digitized objects from the archives of the SBB present online? With a variety of features (for features that are in the current development process have a look at the Beta-Version of the digitized collections) we hope that it is easy and efficient for you to search and browse our digitized objects.

But if you want to get these data to process it or integrate it in your application? For this purpose we provide different technical interfaces (APIs).

 

Interfaces

Currently the Digitised Collections provide two interfaces, OAI-PMH und iiif.

1. OAI-PMH – https://oai.sbb.berlin

Retrieval of metadata for objects in the digitised collections is established by use of the The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard. A wide range of client applications for OAI-PMH in numerous programming languages are freely available on the web.

The base URL for the OAI-PMH endpoint of the digitised collections of the SBB is
https://oai.sbb.berlin/

Using the 6 verbs provided by OAI-PMH, requests such as the following can be generated

The SBB implements DublinCore (DC) for basic bibliographic metadata and METS for all metadata about the contents and structure of a digital object.

By combination of OAI-PMH verbs and the DC-Metadata, more specific requests can be formulated such as

The response contains a unique identifier for each digital oject, the PPN, e.g. oai:digital.staatsbibliothek-berlin.de:PPN867445300. Using the PPN, additional information about a digital object can be retrieved
https://oai.sbb.berlin/?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300

By changing the metadata-prefix to mets, the complete METS metadata record containing all references to any related files (images, OCR) can be retrieved
https://oai.sbb.berlin/?verb=GetRecord&metadataPrefix=mets&identifier=oai%3Adigital.staatsbibliothek-berlin.de%3APPN867445300

The METS file contains a section <fileSec> which holds child elements of the type <fileGrp> which contain references to various files that belong to the digital object, typically images in either JPG or PNG format.
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/full/0/default.jpg
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/full/0/default.png
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/full/0/default.tif

2. IIIF

Retrieval of content (images and full-text) the digitised collections is supported via the International Image Interoperability Framework (IIIF) protocol. Also here a growing number of free clients and libraries for IIIF in numerous programming languages are available on the web.

Currently, digitised images, metadata and fulltext data can be retrieved by requests following this schema:
https://content.staatsbibliothek-berlin.de/dc/{PPN}-{Page}

The PPN is an unique ID for every work that can be found in the digitised collections.

To get scanned images for a specific object further parameters, following the IIIF protocol have to provided in the URL:
/full/{width in pixel},/0/default.jpg wobei width in pixel die Höhe automatisch anpasst, z.B.
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/1200,/0/default.jpg
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/800,/0/default.jpg
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/250,/0/default.jpg

The IIIF protocol permits more image manipulations via URL, e.g. cutting, resizing and rotating the image. In the following example contains an image detail, sized 300 x 300 px, rotated by 90°.
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/100,100,300,300/full/90/default.png

Furtermore its possible to get the orinial TIFF image. Just change the default.jpg to default.tif in the URL:
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-00000001/full/full/0/default.tif

For even more possibilties of manipulation of singe images have a closer look to the IIIF Image API 2.1.1 thats implemented by the content server.

Additionally the content server can deliver more data for specific works. An overview of the addional functions can be found in the NGCS routes documentation. This includes, for example, dynamic highlighting on the pictures. The highlighted areas are defined in the same way as sections of the image. As a further parameter a color can be specified as a hex code: https://content.staatsbibliothek-berlin.de/dc/PPN646236717-00000011/full/1200,/0/default.jpg?highlight=55,100,120,100|1150,460,110,80&highlightColor=ff0000

As specified in the IIIF Presentation API the IIIF manifest file of the object can be retrieved with the URL:
https://content.staatsbibliothek-berlin.de/dc/PPN867445300/manifest

This manifest can be loaded in every IIIF viewer, e.g. in the Mirador viewer, hosted by the SBB:
https://mirador.staatsbibliothek-berlin.de/?manifest=https://content.staatsbibliothek-berlin.de/dc/PPN897443810/manifest&manifest=https://content.staatsbibliothek-berlin.de/PPN876457189/manifestNext to the manifest the metadata can als be retrieved in the METS/MODS format:
https://content.staatsbibliothek-berlin.de/dc/PPN867445300.mets.xml

Fulltexts

With adding the page number to the URL and the suffix ocr.xml the OCR file in ALTO format per page will delivered:
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-0009.ocr.xml for page 9,
https://content.staatsbibliothek-berlin.de/dc/PPN867445300-0010.ocr.xml for page 10 etc.

The OCR files also can loaded completely packed in a ZIP file.
https://content.staatsbibliothek-berlin.de/dc/PPN867445300.ocr.zip

Conditions of Use

SBB pursues an Open Data Policy and endeavours to make all digitised works published before 1920 available to the public under a Public Domain Mark 1.0 licence. In exceptional cases and for works published later than 1920, different licences may be used.

You can recognize the valid license for an object in the Digitized Collections if you display the complete bibliographic information about the object.

 

and there scroll down to the point license / rights info

Of course, you can also find this information in the metadata in METS format under <mods:accessCondition>

Special data sets

For the Hackathon Coding Gender: Women In Cultural Data, which took place at the end of August 2019, thematic datasets were provided, which are described and listed here.

Contact