The Swedish Graduate School of Digital Philology (DigPhil)

We are excited to announce the launch of the National Graduate School of Early Languages and Digital Philology (DigPhil). This project will establish a cutting-edge doctoral research environment that combines the study of early languages – from pre-historical and ancient to early modern languages and their varieties up to the eighteenth century – with advancements in language technology. Our goal is to educate a new generation of philologists, grounded in the rich history of their discipline while possessing a strong proficiency in language technology. DigPhil is made possible thanks to financial support from the Swedish Research Council, 2023-27 (VR 2022-06343).

What is philology? DigPhil departs from a broad definition of Philology as the integrated study of languages and texts within their historical contexts. This field traces its roots back to the Library of Alexandria in the third century BCE, encompassing a dynamic and interdisciplinary interaction between textual criticism, book history, historical linguistics, various forms of textual analysis (meter, rhetoric, narratology, etc.), and hermeneutics.

What is digital philology? Digital philology incorporates digital methods and tools across all aspects of traditional philology. Examples include applying machine learning to restore fragmentary texts, utilizing computational approaches in comparative historical linguistics, conducting automated stylometry or narrative analysis, and extracting information from texts through data-mining techniques. If you're unsure where to start, we have gathered some links to meaningful contributions to digital philology below.

Where is DigPhil? DigPhil is a national graduate school, coordinated by the Faculty of Languages, Uppsala University, in collaboration with Lund University and Stockholm University. Its organizational structure is a follows:

Principal Investigator:
Eric Cullhed (Uppsala University)

Steering Committee:
Mari Bacquin (Lund University)
Elisabet Göransson (Lund University)
Christian Høgel (Lund University)
Jenny Larsson (Stockholm University)
Sofia Lodén (Stockholm University)
Ingela Nilsson (Uppsala University)
Beáta Megyesi (Uppsala University/Stockholm University)
Joakim Nivre (Uppsala University)

For general inquiries, feel free to contact eric.cullhed@lingfil.uu.se, or beata.megyesi@lingfil.uu.se.

MEMBERS

Everita Andronova (Baltic languages, Stockholm University)
Variation in the texts of Georg Mancelius. Employing a single-author corpus for the study of language variants
The PhD project deals with variation in G.Mancelius's texts to show the language change during the 17th c. The linguistic variation in G. Mancelius's texts is a complex reflection of at least three interacting factors: 1) direct German influence (in spelling, grammar and lexis), 2) the presence of dialectal features, and 3) the development of the Latvian written language during 17th c. In order to study a linguistic variation in a historical single-author corpus, philological analysis (different registers, source text analysis), corpus linguistics, and quantitative sociolinguistics methods (statistical analysis of variables, variant analysis) will be applied. Dealing with historical data will benefit from applications of digital philology (spelling normalisation, POS analysis). To conduct the analysis, I would like to apply modern Latvian language NLP tools to early Latvian texts. Taking into account certain phonetic and morphological deviations from Modern Standard Latvian, it is planned to evaluate, adapt, and use an existing POS tagger (via Latvian NLP Pipeline as a Service (LV-PIPE https://nlp.ailab.lv/) for morphological analysis. Variant analysis will highlight the main processes in the development of the Latvian language and detect linguistic and extra-linguistic factors of the variation.

Mahdî Brecq (Old French philology, Stockholm University)
Par-delà la mobilité, le dévoilement. Étude sur Élie de Saint-Gilles (Beyond Mobility, An Unveiling: A Study of Élie de Saint-Gilles)
My PhD project investigates the chanson de geste Élie de Saint-Gilles (13th c.), focusing on its generic identity and textual transmission from its sole Old French manuscript to its Old Norse adaptation (Elíss saga ok Rósamundu). The central research question concerns how this work negotiates the boundaries between epic and romance through processes of rewriting, translation, and transformation. Combining traditional philological methods with digital approaches, I aim to produce: 1) An extensive presentation of the unicum (BnF fr. 25516) I am working with (Codicology, history of book); 2) A digital edition of Élie de Saint-Gilles (Linguistics); 3) Rethink the category of genre of the work (both traditional and digital approach of literary studies) 4) Reconsider the Old French work according to its Scandinavian adaptation (Translation studies, literary studies).

Micaella Bruton (Computational linguistics, Stockholm University)
Revelio! Unravelling Historical Secrets with an Integrated Cipher Decryption Pipeline
This project investigates methods for the automatic decryption, alignment, and key extraction of historical ciphertexts. The central research question is whether computational tools can significantly reduce the time necessary for traditional cryptanalysis, which often takes tens to hundreds of hours per text. The approach involves developing individual models for image analysis, cipher type identification and decryption, and text alignment and key extraction; these models are integrated into the final pipeline where end users can select options based on language and estimated year of authorship, then upload an image for analysis. In cases where full automatic decryption is not possible, the system aims to substantially accelerate traditional methods and provide practical support to human cryptanalysts.

Albin Thörn Cleland (Ancient Greek, Lund University)
Machine Philology
How can NLP, programming and machine learning further our knowledge of ancient languages? My research contributes to answering that question in a number of ways: from making the first automatic annotator of Greek vowel length, through teasing out the melody of 2500-year-old songs by operationalizing theories of Greek pitch accent in Python code, to comparing ancient cultures by semantic vector spaces.

Rasmus Bro Clemmensen (Ancient Greek, Lund University)
The Language of Skepticism
Through a mapping of the language of the 2nd century ancient Greek philosopher Sextus Empiricus using digital tools, the project sets out to discover and highlight overlooked connections and developments in the history of Hellenistic philosophy. The project employs and experiments with a series of different metrics in order to measure similarities in language between philosophical writers and aims to combine these with more traditional philological methods in order to provide new readings of both the philosophy of Sextus and conceptual developments in the Hellenistic period.

Emelie Hallenberg (Classical Greek, Uppsala University)
Romantic Patterns
My project focuses on studying ancient and medieval Greek fiction literature, originating in the first century AD and continuing until the fall of the Byzantine Empire in 1453. I perform comparative studies on this small corpus of texts using traditional literary theories combined with computational tools. The overall objective is to point out how these narratives continue an old tradition, at the same time as they receive inspiration from other culture/language spheres.

Carl Magnus Juliusson (Comparative literature, Lund University)
Babels återkomst. Georg Stiernhielm och den svenska konstpoesin (The Return of Babel. Georg Stiernhielm and the Swedish art poetry)
The 17th century was a time of significant societal change in Sweden. Using Georg Stiernhielm's Discursus astropoeticus, written in a mixed language, as a starting point, I examine mixed languages, multilingualism, language politics and poetics, during the reigns of Queen Christina and Karl X Gustav. My methodology is interdisciplinary, incorporating insights from history of science and ideas, philosophy, history, and linguistics. I aim to contribute to a deeper understanding of the linguistic changes during this period, providing a fresh perspective of Stiernhielm's presumed language purism and patriotism, as well as Christina's affinity for foreign languages. Moreover, I seek to offer a new approach to reading and describing early modern mixed-language poetry, and to enhance our understanding of the politics behind the emergence of a Swedish art poetry.

Irene Miani (Computational linguistics, Uppsala University)
Automatic Scansion for Alliterative Poetry
The PhD project is focused on implementing an automatic scansion tool for alliterative poetry. Analyzing a poem involves several aspects, such as studying its meter, a system of rules and principles that defines the rhythmic beat of a poem based on established structures. This process is called scansion, aims to identify and mark the rhythmic patterns within a poem's lines. It is a very challenging and time-consuming process that many scholars have tried to ease by implementing automatic scansion tools in several languages. However, none of the languages investigated so far belong to the alliterative poetry (i.e., Old English, Old Saxon, Old Norse, Old High German, etc.). Therefore, the project's ultimate goal is to implement a tool for such languages. Equally important is the investigation and development of methods for the different subtasks of scansion, along with the evaluation and comparison of different methods. By achieving these goals, the project aims to support students and scholars interested in the study of alliterative poetry and pave the way for the scansion of languages that have not yet been considered.

Johan Ulrik Nielsen (Linguistics, Uppsala University)
Digital methods for theory-testing in comparative linguistics: The case of Germanic labialized velars
The main research question is to the origin and development of the Proto-Germanic labialized dorsal sounds, i.e. the Germanic Proto-Germanic outcome of the Indo-European labialized dorsals (*kʷ, *gʷ, *gʷʰ) and combinations of dorsal *u̯, and – as part of this – to investigate what the status of Pre-Proto-Germanic *gʷ was in Proto-Germanic. This means testing etymologies, reconstructed proto-forms and proposed relative sound laws and their relatives chronologies. To answer the question and compare existing hypotheses, the traditional comparative method is used, but a digital tool that can compare competing hypotheses of sound changes, etymologies and reconstructions from Proto-Indo-European to Proto-Germanic and represent the ambiguity inherent in this field will also be constructed. This tool will encode several, often mutually exclusive, sets of sound changes, etymologies and reconstructed Proto-Germanic forms ("scenarios") from the existing literature that all claim to account for the attested data. No particular scenario is taken as "correct" – doing so would limit the tool's usefulness for other scholars. Instead, with the tool, the scholar can pick-and-choose etymologies, sound change and reconstructed words from the scholarly literature and easily be shown the drawbacks and features of the assumptions they make about the development of the sounds and words. This will make it easier for scholars – and myself – to clarify the drawbacks of each hypothesis and even draw some quantitative conclusions on the firm basis of traditional comparative methods. The results from the traditional methods and initial observations from the computational tools will be collected in a monography.

Signe Rirdance (Baltic philology, Stockholm University)
The difficult birth of written Latvian through translation in the 17th century: The Proverbs of Solomon in Old Latvian by Getzelius, Mancelius and Glück
My project sets out to digitise an early 17th century manuscript, the Latvian translation of Proverbs of Solomon by Andreas Getzelius, and conduct a comparative study of three translations of this book into Latvian within the 17th century, analysing their sources and paratexts. The following questions will be addressed: How did translation strategies into Old Latvian evolve during the 17th century, from the handwritten Proverbs of Solomon manuscript by Getzelius to the printed translations of this book by Mancelius and Glück? How did the written norm of Latvian develop within these translations? What can be learnt from the process of AI-powered recognition and transcription of the handwritten manuscript by Getzelius? Using the case study of the Proverbs of Solomon, my work aims to provide a better understanding of translation ideologies underlying the creation of written Latvian in the 17th century, within the context of vernacular Bible translations in the region. Addition of Getzelius' manuscript to the corpora of digitally available Old Latvian texts will contribute to the collection of old Latvian corpora and will serve as a pilot for using AI-powered technology in handling handwritten texts in Old Latvian.

Crina Tudor (Computational linguistics, Stockholm University)
Timeless Texts and Technical Trials - Named Entity Recognition and Multilingual Language Modelling for Historical Texts
The present PhD project investigates Named Entity Recognition (NER) in multilingual historical texts, a domain that poses unique challenges due to scarce annotated data, variation in spelling and orthography, OCR-induced noise, and inconsistent annotation guidelines. The core research questions are: (1) how can Large Language Models (LLMs) be adapted to recognize entities in noisy and linguistically diverse historical corpora with minimal supervision, (2) how can the performance of existing LLMs be improved to make them better suited for historical text applications, and (3) how can we develop an end-to-end pipeline (i.e. from image to annotation) for processing historical heritage documents? To address these questions, we aim to combine prompt-based and few-shot learning methods with experiments in cross-lingual transfer learning. We also explore strategies for evaluating model performance when ground truth is limited, noisy, or inconsistent, including the development of more flexible evaluation metrics tailored to historical data. By investigating both the technical side, through optimizing generative LLMs for historical language data, and the methodological side, by investigating approaches for standardizing annotations and evaluating model performance, we aim to contribute to more reliable approaches for entity recognition in cultural heritage materials. The main contribution of this work is a pipeline for the automatic annotation of historical and cultural heritage texts, which will help libraries, archives, and research institutions enrich and make their collections more accessible. Beyond technical advances, the project will also shed light on best practices for working with multilingual and temporally diverse corpora, thereby supporting the broader field of historical NLP.

Oreen Yousuf (Computational linguistics, Uppsala University)
Handwritten Text Recognition for Ajami Manuscripts
This project applies both computer vision and NLP techniques to better analyze text in Ajami languages across Africa. "Ajami" refers to African languages written in modified versions of the Arabic script. These texts are often historical and may appear in monolingual or bilingual manuscripts. The research question is two-fold: Minor - how to best improve existing text-recognition models for these languages; Major - how to best perform NLP tasks such as topic modeling, summarization, etc. on these languages after digitized text has been curated. To investigate these research questions, we 1) evaluate existing handwritten and print recognition models trained specifically on historical Arabic-script language manuscripts; and 2) curate ground-truth data in 3-5 Ajami languages and utilize existing pre-processing and evaluation resources for the modern-versions of these languages. Our contribution is to 1) highlight the poor performance of existing Arabic-script recognition models on Ajami languages, 2) release the first-ever ground-truth datasets for 3-5 Ajami languages, and 3) document the feasibility and limitations of performing different NLP tasks in a very low-resource setting for historical African-languages.

AFFILIATED MEMBERS

Anastasiia Alexandrova (Computational linguistics, Uppsala University)
Exploring Language Change in Non-Fictional Texts of Late Modern Swedish Using Computational Linguistics Methods
This PhD study, conducted within the VR-funded research project Language Change and Non-Fictional Texts – A Large-Scale Investigation of Late Modern Swedish, aims to investigate morphosyntactic change in Swedish non-fictional prose from 1800 to 1950 using computational linguistic methods. This period was crucial for the standardization of Swedish and was characterized by a significant expansion of written genres, including scientific, political, practical, and cultural prose. Due to the challenges posed by the data, the project applies post-OCR correction techniques and NLP tools to enable reliable morphosyntactic parsing of diachronic corpora. The study builds on the Universal Dependencies framework, as well as on methods from artificial intelligence and large language models, to trace systematic changes in syntax, morphology, and style across time and domains. The main contribution of the project is to provide a more comprehensive understanding of the historical development of the Swedish language and to reveal the role of non-fictional texts in this process, thereby offering further evidence for the factors underlying language norm establishment.

Chantal Pivetta (Italian Studies/Philology, Lund University)
Uberto e Filomena: A Semi-Diplomatic Edition and Study of a 15th-Century Chivalric Poem in Old Venetian and Old Emilian
This dissertation presents a semi-diplomatic printed edition of Uberto e Filomena, a fifteenth-century Venetian and Emilian Old vernaculars poem in ottava rima transmitted through a rich and active manuscript tradition. The chosen base manuscript was copied by a woman, an aspect of particular relevance to the study of vernacular textual culture and female scribal agency in the fifteenth century. Her transcription reflects specific linguistic, orthographic, and cultural choices, which are preserved and critically contextualized in the edition. Although the final edition is printed, the research process is grounded in digital philology. The project employs Digital Philology for Dummies (DPhD), a platform I designed and developed in collaboration with a multidisciplinary team. Within this project, DPhD is used to accelerate TEI-XML markup, manage structured textual data through an integrated relational database, and generate dynamic visualizations in EVT3. The relational database enabled a systematic analysis of substantive variants of the poem, leading to the identification of three distinct textual redactions. These relationships are represented through a cladiogram, which can be explored interactively, highlighting patterns of scribal intervention, transmission, and adaptation. While the dissertation provides the first scholarly semi-diplomatic edition, a full digital edition, including dynamic apparatus and visualizations, will be published after the thesis.

Gwénaëlle Beynet Fröjd (French Studies/French Philology, Lund University)
Écrits cachés, trésors révélés : Immersion philologique et historique à travers des œuvres françaises préservées par l'aristocratie suédoise du XVe au XIXe siècle (Hidden Writing, Treasures Revealed: Philological and Historical Deep Dive through French Works preserved by the Swedish Aristocracy from the 15th to the 19th Century)
In my dissertation project, I follow the trail of a French duke who fled the French Revolution and settled with Axel von Fersen and his sister Sophie Piper (ca. 1800–1814). When he was partially freed from his diplomatic and military duties, he wrote an erotic work, and two salon plays in Sweden. With the help of various historical sources, I have been able to reconstruct part of his route and life, and I will also publish the literary works that have never been published before. My methodology is interdisciplinary, I use mostly traditional philology methods, and I investigate the texts using a transnational approach in history, and more specifically l'histoire croisée. To publish the texts, I complete my work with literary analyses. My goal is to combine it with digital philology methods for some parts of my work such as mapping French theatre pieces in Sweden. I aim to explore how philology can complement history to shed light on little-known aspects of the social and cosmopolitan circles of Swedish and French aristocrats during a pivotal period for Europe, the turn of the 18th century to the early 19th century. I seek to examine our subjects of study in order to better understand the dynamics of life in a castle. These subjects, when viewed in relation to one another, will reveal the intimate bonds that united aristocrats, as well as details about their relationships with the outside world, all through a philological approach.

Noa Håkansson (Nordic philology, Uppsala University)
Constructing and transforming history: the rewriting of Old Swedish chronicles from the Middle Ages to the Early Modern Era
This project aims to investigate how historiographical works from the Swedish Middle Ages have been consciously edited and rewritten over time. The project further aims to highlight that these historical-political chronicles have always been written in a specific political context with a specific purpose, but that the traditions of the texts—which often span several hundred years—extend beyond this original context and purpose. The project is thus less interested in the original wording of the texts but focuses instead on tracing the extra-textual and societal changes that have brought about empirically measurable textual changes in the extant manuscripts. In doing so, this project hopes to shed new light on how texts can be reused and repurposed and how this interplays with such things as writing cultures and political ambitions. The study is guided by the following questions: (1) How were historical-political texts rewritten as a result of conscious editing and how can the rewriting process be defined and studied? (2) What patterns and strategies can be identified in the rewriting process? What factors, both internal and external, are decisive? Computational stemmatology will be applied with the purpose of obtaining a baseline overview of the texts' manuscript traditions. Stylometry will then be applied in order to further analyse the relationship between different versions/redactions of the same textual work. Finally, this will be combined with qualitative close readings that are guided by text-critical principles in order to establish the more intricate details of the rewritings. Since computational studies on Old Swedish material are very scarce, a further perceived need that this study addresses is to provide the field with methodological metadata that can guide and inform future studies that plan to apply computer-assisted methods on Old Swedish textual traditions.

Viktor Johansson (Ancient Greek, Uppsala University)
Ancient Textual Criticism and Textual Ideology: Text, Authority and Critics in the Early Roman Empire
How did textual variation affect the way ancient Greeks and Romans related to their canonical literature? In this thesis, I will: (1) Examine explicit text-critical discussions in Greek and Latin authors from the 6th cent. BCE to the 3rd cent. CE, and explore the role, function, and purpose of text-critical public discourse and its implications for textual ideology, with particular focus on authors from the latter period, such as Galen of Pergamum, Aulus Gellius, and Origen of Alexandria. (2) Explore the use of text-critical arguments employed by ancient authors and how these relate to their textual ideology. (3) Examine how ancient authors viewed text-critical uncertainty as potentially undermining textual authority, and, at the same time, how this very uncertainty could be turned into a means of safeguarding the text's authoritative status. (4) Investigate ancient views on the epistemic status of text-critical arguments.

Wout Sinneave (Computational Linguistics, Oslo University)
Improving runic analysis and textual restoration using NLP models and image analysis
For my PhD project, I will be employing Natural Languages Processing (NLP) techniques and image analysis for the analysis and restoration of runic inscriptions. In our first study, we tackled the restoration of runic inscriptions using statistical methods, employing n-gram probabilities and a modified Minimum Edit Distance (MED) algorithm. The approach showed promising results, achieving prediction coverage of 84.97% and accuracy up to 86.96%. In our second study, we employed entropy and MED to quantify spelling variation, comparing proper names and non-names. For our third study, we will move from statistical models to neural models, testing the performance of transformers on runic data and comparing various methods of circumventing the low-resource problem. Finally, we will also attempt to analyse and restore runic tokens on the image level.

Contact

  • info@lingfil.uu.se

FOLLOW UPPSALA UNIVERSITY ON

Uppsala University on Facebook
Uppsala University on Instagram
Uppsala University on Youtube
Uppsala University on Linkedin