-
Humanities and the arts
- Curatorial and related studies
- History
- Other history and archaeology
- Art studies and sciences
- Artistic design
- Audiovisual art and digital media
- Heritage
- Music
- Theatre and performance
- Visual arts
- Other arts
- Product development
- Study of regions
Many cultural heritage collections are nowadays going through a phase of mass-digitization, whereby heritage objects are digitized, catalogued and published at an unprecedented scale using computational means. This process is challenging because of the rapid pace at which it progresses: the digitization of cultural artefacts itself is already an expensive, and time-consuming process and yet, in the end, it only yields low-level data (e.g. raw scans) which have to be supplemented with descriptive metadata to become practically useful (e.g. assign a period of composition to a painting; describe the subject of a photograph etc.). This process is known as (semantic) data enrichment. Such metadata is often assigned using thesauri (e.g. Art and Architecture Thesaurus), that provide a standardized terminology ('controlled vocabularies') to characterize cultural artefacts. While crucial to both curation and research, such annotations are still expensive to obtain because they are provided manually by subject experts, who need to master the domain-specific language of individual thesauri. The field of cultural heritage collections is characterized by an interesting discrepancy. There are the well-known collections of larger institutions, such as the Rijksmuseum or the British Museum, that enjoy a high, international visibility among the general public. Many of these institutions have taken the lead in collection digitization and are increasingly opening up their content to a wider audience in the public domain, often using extremely liberal licenses that encourage re-use. On the other hand, the cultural heritage sector abounds in smaller players, that lack the funding or manpower necessary to manually undertake such digitization and cataloguing initiatives at the same pace or scale. Such, smaller players are increasingly experiencing difficulties in handling the incoming quantity of digitized materials. This gap between large, progressive and smaller, less advanced digital collections calls for the following research incentive: it should be possible to automatically extract the knowledge captured in the larger data sets, readily available for re-use, to support the data processing in up and coming collections. The objective of this project therefore is to advance the application of automated algorithms from the field of Artificial Intelligence (Al) to support cultural heritage institutions in their effort to keep up with their ongoing annotation efforts for their expanding digital collections. In particular, we will focus on recent advances in Machine Learning, where the application of neural networks (Deep Learning) has recently led to significant breakthroughs, for instance, in the field? of Natural Language Processing and Computer Vision. We will determine how state-of-the-art algorithms can be used to (semi-)automatically catalogue and describe digital objects,especially those for which no, little or incomplete metadata is available. Importantly, our project aims to increase the interoperability of systems through providing'data-driven export filters' that allow institutions to share their collections as open linked data (Sanderhoff 2014), even if their collections for various practical reasons hitherto used closed-vendor, ad hoc metadata systems, e.g. mono-lingual thesauri that are still incompatible with international standards. Interestingly, many practitioners in the field of cultural heritage seem rather unaware of the significant progress which has recently been booked in AL We therefore aim to raise awareness of the capabilities of present-day Al and bring heritage management up to speed with recent advances in Machine Learning.