Computational Linguistics
The Computational Linguistics group at Uppsala University does research on computational modeling of natural language and practical applications involving natural language processing (NLP).
Computational Linguistics, or Language Technology, is an interdisciplinary field dealing with the computational modeling of natural language. Traditionally, research has been driven both by the theoretical goal of understanding human language and by practical applications involving natural language processing, such as systems for automatic translation, information retrieval, and human-computer dialogue. Currently, the emergence of large language models and generative AI is reshaping the field.
The Computational Linguistics group at Uppsala University has a broad research agenda with two focus areas: digital philology and multilingual NLP. Digital philology deals with computational methods for the interpretation of text, with applications like historical text processing, historical cryptography, digital literary studies, and hand-written text recognition. Multilingual NLP deals both with inherently multilingual tasks like machine translation and with the use of multilingual resources to support low-resource languages in tasks like dependency parsing. The group has been involved in the development of a number of resources and tools, such as Universal Dependencies (morphosyntactically annotated corpora), PARSEME (corpora annotated for multiword expressions), Swedish Diachronic Corpus, UUParser (data-driven dependency parser).


Joakim, Meriem, Fredrik, Eva, Sara, Irene, Mats, Luise, Johan, and Ahmed.
Group members
Meriem Beloucif, associate senior lecturer
Mats Dahllöf, senior lecturer
Luise Dürlich, PhD student
Ellinor Lindqvist, PhD student
Beata Megyesi, professor
Irene Miani, PhD student
Joakim Nivre, professor
Eva Pettersson, researcher
Ahmed Ruby, PhD student
Johan Sjons, lecturer
Sara Stymne, senior lecturer
Anna Sågvall Hein, professor emerita
Fredrik Wahlberg, associate senior lecturer
Oreen Yousuf, PhD student
Publications
Author gender and text characteristics in contemporary Swedish fiction
Part of Language and Literature, p. 69-100, 2024
- DOI for Author gender and text characteristics in contemporary Swedish fiction
- Download full text (pdf) of Author gender and text characteristics in contemporary Swedish fiction
Branch-GAN: Improving Text Generation with (not so) Large Language Models
Part of The Twelfth International Conference on Learning Representations, 2024
Continual Learning Under Language Shift
Part of Text, Speech, and Dialogue (TSD 2024), 2024
Continual Learning Under Language Shift
Part of Text, Speech, and Dialogue, p. 71-84, 2024
Part of Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), p. 253-263, 2024
ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality
Part of Advances in Information Retrieval (ECIR 2024), 2024
Function Words in Universal Dependencies
Part of Linguistic Analysis, p. 549-588, 2024
Investigating the Role of Prosody in Disambiguating Implicit Discourse Relations in Egyptian Arabic
p. 926-930, 2024
Keys with nomenclatures in the early modern Europe
Part of Cryptologia, p. 97-139, 2024
- DOI for Keys with nomenclatures in the early modern Europe
- Download full text (pdf) of Keys with nomenclatures in the early modern Europe
Models and Strategies for Russian Word Sense Disambiguation: A Comparative Analysis
Part of Text, Speech, and Dialogue (TSD 2024), 2024
Orden som avslöjar författaren
Part of Språktidningen, p. 55-57, 2024
Overview of ELOQUENT 2024 – Shared Tasks for Evaluating Generative Language Model Quality
Part of Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2024), 2024
Overview of the CLEF-2024 Eloquent Lab: Task 2 on HalluciGen
Part of Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), p. 691-702, 2024
European Language Resources Association, 2024
Relation between Cross-Genre and Cross-Topic Transfer in Dependency Parsing
Part of Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), p. 13879-13884, 2024
Part of Selected papers from the CLARIN Annual Conference 2023, 2024
UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies
Part of Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), p. 16919-16932, 2024
UniDive: A COST Action on Universality, Diversity and Idiosyncrasy in Language Technology
Part of Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, p. 372-382, 2024
Using LLMs to Build a Database of Climate Extreme Impacts
Part of Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024), p. 93-110, 2024
AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
Part of Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, p. 13968-13981, 2023
- DOI for AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
- Download full text (pdf) of AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
BERTie Bott's Every Flavor Labels: A Tasty Introduction to Semantic Role Labeling for Galician
Part of Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, p. 10892-10902, 2023
- DOI for BERTie Bott's Every Flavor Labels: A Tasty Introduction to Semantic Role Labeling for Galician
- Download full text (pdf) of BERTie Bott's Every Flavor Labels: A Tasty Introduction to Semantic Role Labeling for Galician
Historical Language Models in Cryptanalysis: Case Studies on English and German
Part of Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023, 2023
Improving Translation Quality for Low-Resource Inuktitut with Various Preprocessing Techniques
Part of Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, p. 475-479, 2023
Investigating UD Treebanks via Dataset Difficulty Measures
Part of Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, p. 1076-1089, 2023
- DOI for Investigating UD Treebanks via Dataset Difficulty Measures
- Download full text (pdf) of Investigating UD Treebanks via Dataset Difficulty Measures
Low-Resource Techniques for Analysing the Rhetorical Structure of Swedish Historical Petitions
Part of Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), p. 132-139, 2023
Multilingual Automatic Speech Recognition for Scandinavian Languages
Part of Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), p. 460-466, 2023
On the Concept of Resource-Efficiency in NLP
Part of Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), p. 135-145, 2023
Part of Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), p. 24-35, 2023
PARSEME Meets Universal Dependencies: Getting on the Same Page in Representing Multiword Expressions
Part of Northern European Journal of Language Technology (NEJLT), 2023
- DOI for PARSEME Meets Universal Dependencies: Getting on the Same Page in Representing Multiword Expressions
- Download full text (pdf) of PARSEME Meets Universal Dependencies: Getting on the Same Page in Representing Multiword Expressions
Parser Evaluation for Analyzing Swedish 19th–20th Century Literature
Part of Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), p. 335-346, 2023
2023
SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
Part of Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), p. 2319-2337, 2023
- DOI for SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
- Download full text (pdf) of SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
2023
Towards Data-effective Educational Question Generation with Prompt-based Learning
Part of Proceedings of 2023 Computing Conference, 2023
UD-MULTIGENRE: a UD-Based Dataset Enriched with Instance-Level Genre Annotations
Part of Proceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL), p. 253-267, 2023
- DOI for UD-MULTIGENRE: a UD-Based Dataset Enriched with Instance-Level Genre Annotations
- Download full text (pdf) of UD-MULTIGENRE: a UD-Based Dataset Enriched with Instance-Level Genre Annotations
Part of Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023), p. 126-144, 2023
Part of Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), p. 1491-1497, 2023
- DOI for Uppsala University at SemEval-2023 Task12: Zero-shot Sentiment Classification for Nigerian Pidgin Tweets
- Download full text (pdf) of Uppsala University at SemEval-2023 Task12: Zero-shot Sentiment Classification for Nigerian Pidgin Tweets
Using Wikidata for Enhancing Compositionality in Pre-trained Language Models
Part of Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, p. 170-178, 2023
What Causes Unemployment?: Unsupervised Causality Mining from Swedish Governmental Reports
Part of Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), p. 25-29, 2023
What is the Code for the Code?Historical Cryptology Terminology
Part of Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023, 2023
Part of Language Resources and Evaluation, p. 1075-1102, 2022
- DOI for A Tale of Four Parsers: Methodological Reflections on Diagnostic Evaluation and In-Depth Error Analysis for Meaning Representation Parsing
- Download full text (pdf) of A Tale of Four Parsers: Methodological Reflections on Diagnostic Evaluation and In-Depth Error Analysis for Meaning Representation Parsing
Cause and Effect in Governmental Reports: Two Data Sets for Causality Detection in Swedish
Part of Proceedings of the First Workshop on Natural Language Processing for Political Sciences (PoliticalNLP), p. 46-55, 2022
Exploring Cross-Lingual Transfer to Counteract Data Scarcity for Causality Detection
Part of WWW '22, p. 501-508, 2022
- DOI for Exploring Cross-Lingual Transfer to Counteract Data Scarcity for Causality Detection
- Download full text (pdf) of Exploring Cross-Lingual Transfer to Counteract Data Scarcity for Causality Detection
Part of Pattern Recognition Letters, p. 43-49, 2022
- DOI for Few shots are all you need: A progressive learning approach for low resource handwritten text recognition
- Download full text (pdf) of Few shots are all you need: A progressive learning approach for low resource handwritten text recognition
Identifying Cleartext in Historical Ciphers
Part of Proceedings of the Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022., 2022
Lost in Transcription of Graphic Signs in Ciphers
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022, p. 153-158, 2022
Nucleus Composition in Transition-Based Dependency Parsing
Part of Computational Linguistics, p. 849-886, 2022
- DOI for Nucleus Composition in Transition-Based Dependency Parsing
- Download full text (pdf) of Nucleus Composition in Transition-Based Dependency Parsing
Proceedings of the 5th International Conference on Historical Cryptology
2022
Quotation and Narration in Contemporary Popular Fiction in Swedish: Stylometric Explorations
Part of Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), p. 203-211, 2022
Schrödinger's tree: On syntax and neural language models
Part of Frontiers in Artificial Intelligence, 2022
- DOI for Schrödinger's tree: On syntax and neural language models
- Download full text (pdf) of Schrödinger's tree: On syntax and neural language models
Contact
- info@lingfil.uu.se