Skip to main content

quod: A Tool for Querying and Organising Digitised Historical Documents

In this blog post we introduce ‘quod’ (querying OCRed documents), a prototype Python-based command line tool for OCRing and querying digitised historical documents, which can be used to organise large collections. To demonstrate its use in context, this blog takes the reader through a case study of the International Tracing Service, showing workflows and the steps taken from start to finish.

The archives of the International Tracing Service (ITS) in Bad Arolsen, Germany focus on the topics of wartime incarceration, forced labour, and liberated survivors. Within their holdings is a collection of approximately 4.2 million images of documents that have been rearranged without taking into account their provenance. This makes it difficult — or at least highly laborious — to determine, for example, to which subcollection an image originally belongs; or to organise the collections in certain ways.

ITS has rearranged many of its collections following the needs of the tracing tasks. As a consequence it is relatively easy to search for names of people in the collection. Yet, at the same time, due to the loss of provenance and context information it is hard to approach the collections with research questions. For example, a researcher or archivist may want to define employers of forced labourers and even arrange the collections according to these.

Learning outcomes

After viewing this training resource, users will be able to:

  • recognise how important metadata and provenance are in the management of archive materials
  • understand how the Python prototype command line “quod” can be used to organise large collections within archives
Interested in learning more?

Check out: quod: A Tool for Querying and Organising Digitised Historical Documents

Go to this resource

Cite as

Reinier De Valk (2018). quod: A Tool for Querying and Organising Digitised Historical Documents. Version 1.0.0. EHRI. [Training module]. https://blog.ehri-project.eu/2018/03/29/quod/

Reuse conditions

Resources hosted on DARIAH-Campus are subjects to the DARIAH-Campus Training Materials Reuse Charter

Full metadata

Title:
quod: A Tool for Querying and Organising Digitised Historical Documents
Authors:
Reinier De Valk
Domain:
Social Sciences and Humanities
Language:
en
Published:
2/2/2023
Content type:
Training module
Licence:
CCBY 4.0
Sources:
EHRI
Topics:
Big data, Digital Archives, eHeritage, Information Architecture, Metadata, Repositories & Collections
Version:
1.0.0