Digital Media Workflow

An innovative concept for building digital collections to implement text and data mining methods

Brief description

The aim of the project is to develop a prototypical and easily adaptable workflow that will enable libraries to aggregate and process digital research publications and finally make them available to researchers in a uniform XML/TEI format.

Providing the documents in this uniform, structured format will make it easier for researchers to compile large text corpora and analyse them using innovative text and data mining methods, as well as improve document retrieval through advanced search options. In order to best ensure the reusability of the processed documents also from a legal point of view, the project consistently follows the approach of making them available as open as possible (i.e. as open as the license terms of the original documents allow).

As use cases, the project will focus on Open Access publications from scientific publishers as well as dissertations of members of the TU Darmstadt, the latter as examples of publications for which libraries can influence the format in which they are to be submitted.

The workflow will cover all work steps from harvesting the documents to making them available in the TEI target format in the form of software tools and organisational concepts. This includes procedures for checking and documenting the license information of the documents, scripts for harvesting them from different publishing platforms, as well as procedures for format validation and conversion, cataloguing and long-term archiving.

Due to the interoperability of file formats, the further development of existing software, and the publication of all project results under free licenses, the “Digital Media Workflow” can easily be adapted and used by other infrastructure facilities and can in principle be extended to all types of documents.

Project status

ongoing, 1.11.22 – 31.10.25


Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)