Ellinor Lindqvist: Humble Pleas from the Archives: Automatic Analysis and Information Extraction from Historical Petitions

Date
12 June 2026, 13:15
Location
Humanistiska teatern, Thunbergsvägen 3H, Uppsala
Type
Thesis defence
Thesis author
Ellinor Lindqvist
External reviewer
Michael Piotrowski
Supervisors
Joakim Nivre, Eva Pettersson
Research subject
Computational Linguistics
Publication
https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-584819

Abstract

Historical archives offer rich insights into past lived experiences, yet linguistic variation, non-standard orthography, and limited annotated resources challenge computational analysis. This dissertation investigates the application of NLP to historical text, focusing on 18th-century Swedish petitions. The work explores how different modelling paradigms can support both genre identification and extraction of historically meaningful information.

The research follows a stepwise methodology across four studies. First, petitions are classified alongside other historical text types using approaches ranging from traditional machine learning models to a Swedish BERT-based classifier, achieving strong results on in-domain data. Second, feature analysis is used to identify key linguistic markers of the petition genre, including thematic vocabulary and expressions of social hierarchy. Third, we explore automatic methods to identify rhetorical components, such as salutations and requests, using both low-resource techniques and LLMs, showing that while formulaic sections can be reliably detected, other parts remain inherently ambiguous. The inclusion of an English dataset further enables evaluation of cross-linguistic generalisation.

Finally, the dissertation addresses extraction and phrase normalisation of work-related expressions within the Gender and Work (GaW) framework. Experiments with large language models show promising results: although exact phrase matching is weak, string-level and semantic similarity indicate that models can locate relevant topical regions. Qualitative analysis further shows that models can detect plausible work-related expressions not present in the gold data, pointing towards hybrid human–machine workflows for improving coverage in historical research.

A key contribution of this work is the application of evaluation strategies that move beyond exact matching to incorporate string-level and semantic similarity, enabling a more nuanced assessment of model performance on noisy historical text. Overall, the findings highlight both the potential and limitations of current NLP methods for historical text, and demonstrate how computational approaches can support the analysis of complex archival material.

FOLLOW UPPSALA UNIVERSITY ON

Uppsala University on Facebook
Uppsala University on Instagram
Uppsala University on Youtube
Uppsala University on Linkedin