En man håller upp händerna, men boken är en halvmeter framför honom i luften. En illustrerad kon visar hur hans synfält är riktat mot boken.

Illustration: Anna-Maria Hällgren

Uppsala Computational Literary Studies Group

The computational literary studies group at Uppsala University (UCOL) was initiated in 2017 as a 2-year collaborative project (funded by UU), which included scholars in literature, the Scandinavian languages, and computational linguistics. The project was fruitful and in 2020 a more permanent research group was established.

The main focus of UCOL is to deploy and develop computational methods for the investigation of Swedish literature and its contexts. Consequently, this includes a broad range of research questions, methods, and materials: from literary stylistics and quantitative approaches to narratives to sociology of literature and textual scholarship; from 19th century classics to contemporary popular fiction; from small or even single-novel corpora to large-scale datasets; from basic word counts and descriptive statistics to complex machine learning algorithms.

Research group

Affiliated researchers

  • Sarah Allison (Department of English, Loyola University, New Orleans, USA)
  • Karina van Dalen-Oskam (Department of Literary Studies, Huygens Institute and University of Amsterdam, the Netherlands)


Research projects

At the moment, the work in UCOL is focused on three larger research projects:


Diagram som visar mönster i data över strömning. En topp återkommer årligen under 2015, 2016 och 2017.

Example of streaming pattern data

"Patterns of Popularity: Towards a Holistic Understanding of Contemporary Bestselling Fiction" aims to investigate the most popular contemporary novels at scale, and through a combination of empirical approaches, covering digital text material (ebooks), contextual and book trade material, and reader consumption data. The ambition is to find out in what ways bestsellers stand out, and how formats such as the audiobook affect writing styles and narratives. The project includes a collaboration with Storytel that provides access to data points on real-time book consumption, a dataset that enable new ways to merge publishing studies and readership studies.

Participants: Karl Berglund (PI), Mats Dahllöf
Duration: 2020–2023
Funder: Swedish Research Council



"Fictional Prose and Language Change: The Role of Colloquialization in the history of Swedish 1830–1930" aims to investigate if language change in Swedish in the 19th century was driven by fiction and its move towards naturalism (The Modern Breakthrough). Since it has been claimed that colloquialization first was expressed in fictional prose, the project focuses on stylistic variability in literary texts and investigates whether colloquial linguistic features have spread from dialogue to narrative by developing and using digital methods of corpus stylistics in large scale materials. The empirical point of departure is Litteraturbanken, a corpus of >4200 Swedish works from 1650 to 1940.

Participants: David Håkansson (PI), Sara Stymne, Johan Svedjedal, Carin Östman
Duration: 2021–2023
Funder: Swedish Research Council


Porträtt av Selma Lagerlöf, linjerat papper i bakgrunden samt ettor och nollor över större delen av bilden.

Illustration: Anna-Maria Hällgren

Blyertstecknad bild av Astrid Lindgren med stenografi i bakgrunden

Illustration: Jenny Jansson

"The Astrid Lindgren Code: Accessing Astrid Lindgren’s shorthand manuscripts through handwritten text recognition, media history, and genetic criticism" explores a material previously untouched by research. It does so primarily through the combination of two digital methods: development and adaptation of algorithms for handwritten text recognition (HTR), and crowd/expert sourcing. The project utilises the joint competences of literary scholars, computer scientists, and professional stenographers to unlock the potential of Lindgren’s original drafts, enable a starting point for full digitalisation and transliteration of Lindgren’s original manuscripts, and provide a general vehicle for methodological development for analysis of handwritten documents.

Participants: Malin Nauwerck (PI), Karolina Andersdotter, Anders Hast, Raphaela Heil
Duration: 2020–2022
Funder: Riksbankens jubileumsfond (RJ)
Project website