Retrokonversion Katalogkarten mittels Vision Language Models

Conversion of Catalog Cards Using Vision Language Models

As part of the Berlin State Library’s digitization strategy, approximately 200,000 typewritten catalog cards from the Catalog of Music Books and Scores II (Katalog der Musikbücher und Noten II) of the Music Department are to be processed. A pilot project jointly conducted by the Music Department and the Stabi Lab serves to develop and test an appropriate workflow using modern Vision Language Models (VLMs), as well as to determine the financial and human resources required for implementing the overall project.

The system to be developed operates in two consecutive processing phases. First, the image files are converted into machine-readable full text using optical character recognition (OCR). In a second step, these texts are automatically structured into standardized data formats. The processing logic is designed to prepare the transfer of individual catalog cards into the library’s catalog system. Before final ingestion, the extracted data are reviewed, enriched, and cleaned using OpenRefine in a penultimate step, and then imported into the catalog system.

The workflow can be interrupted and resumed as needed, since previously processed files are automatically skipped when restarting. This enables flexible and fault-tolerant processing without data loss or duplication of work. Of particular importance is the possibility of making a large collection accessible in structured form for the first time. Automated processing makes it possible to generate data on a scale that is currently not feasible through purely manual cataloging, while also ensuring consistent data quality and format uniformity.

A central component of the project is the evaluation of the generated data. For this purpose, samples of processed catalog cards are manually reviewed and compared with the automatically generated datasets. The evaluation includes, among other aspects:

recognition accuracy (OCR quality),
correct assignment of metadata fields,
consistency of the data structure, and
usability of the data within the target system.

In addition, typical error classes are identified in order to iteratively improve the processing logic. The evaluation thus serves not only quality assurance but also the continuous refinement of the workflow.

For further information about the project, please contact:

Dr. Andreas Janke (Head of “Research Data and Digital Humanities” | Music Department)
Dr. Roman Kuhn (Stabi Lab)
Dorian Grosch (Stabi Lab)

Conversion of Catalog Cards Using Vision Language Models

Social networks of the Berlin State Library