Computational Linguistics
The Computational Linguistics group at Uppsala University does research on computational modeling of natural language and practical applications involving natural language processing (NLP).
Computational Linguistics, or Language Technology, is an interdisciplinary field dealing with the computational modeling of natural language. Traditionally, research has been driven both by the theoretical goal of understanding human language and by practical applications involving natural language processing, such as systems for automatic translation, information retrieval, and human-computer dialogue. Currently, the emergence of large language models and generative AI is reshaping the field.
The Computational Linguistics group at Uppsala University has a broad research agenda with two focus areas: digital philology and multilingual NLP. Digital philology deals with computational methods for the interpretation of text, with applications like historical text processing, historical cryptography, digital literary studies, and hand-written text recognition. Multilingual NLP deals both with inherently multilingual tasks like machine translation and with the use of multilingual resources to support low-resource languages in tasks like dependency parsing. The group has been involved in the development of a number of resources and tools, such as Universal Dependencies (morphosyntactically annotated corpora), PARSEME (corpora annotated for multiword expressions), Swedish Diachronic Corpus, UUParser (data-driven dependency parser).
Group members
Meriem Beloucif, associate senior lecturer
Mats Dahllöf, senior lecturer
Luise Dürlich, PhD student
Ellinor Lindqvist, PhD student
Beata Megyesi, professor
Irene Miani, PhD student
Joakim Nivre, professor
Eva Pettersson, researcher
Ahmed Ruby, PhD student
Johan Sjons, lecturer
Sara Stymne, senior lecturer
Anna Sågvall Hein, professor emerita
Fredrik Wahlberg, associate senior lecturer
Oreen Yousuf, PhD student
Publications
Author gender and text characteristics in contemporary Swedish fiction
Part of Language and Literature, p. 69-100, 2024
Continual Learning Under Language Shift
Part of Text, Speech, and Dialogue, p. 71-84, 2024
Keys with nomenclatures in the early modern Europe
Part of Cryptologia, p. 97-139, 2024
Orden som avslöjar författaren
Part of Språktidningen, p. 55-57, 2024
Part of Selected papers from the CLARIN Annual Conference 2023, 2024
AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
Part of Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, p. 13968-13981, 2023
BERTie Bott's Every Flavor Labels: A Tasty Introduction to Semantic Role Labeling for Galician
Part of Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, p. 10892-10902, 2023
Historical Language Models in Cryptanalysis: Case Studies on English and German
Part of Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023, 2023
Improving Translation Quality for Low-Resource Inuktitut with Various Preprocessing Techniques
Part of Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, p. 475-479, 2023
Investigating UD Treebanks via Dataset Difficulty Measures
Part of Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, p. 1076-1089, 2023
Low-Resource Techniques for Analysing the Rhetorical Structure of Swedish Historical Petitions
Part of Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), p. 132-139, 2023
Multilingual Automatic Speech Recognition for Scandinavian Languages
Part of Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), p. 460-466, 2023
On the Concept of Resource-Efficiency in NLP
Part of Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), p. 135-145, 2023
Part of Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), p. 24-35, 2023
PARSEME Meets Universal Dependencies: Getting on the Same Page in Representing Multiword Expressions
Part of Northern European Journal of Language Technology (NEJLT), 2023
Parser Evaluation for Analyzing Swedish 19th–20th Century Literature
Part of Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), p. 335-346, 2023
2023
SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)
Part of Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), p. 2319-2337, 2023
2023
Towards Data-effective Educational Question Generation with Prompt-based Learning
Part of Proceedings of 2023 Computing Conference, 2023
UD-MULTIGENRE: a UD-Based Dataset Enriched with Instance-Level Genre Annotations
Part of Proceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL), p. 253-267, 2023
Part of Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), p. 1491-1497, 2023
Using Wikidata for Enhancing Compositionality in Pre-trained Language Models
Part of Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, p. 170-178, 2023
What Causes Unemployment?: Unsupervised Causality Mining from Swedish Governmental Reports
Part of Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), p. 25-29, 2023
What is the Code for the Code?Historical Cryptology Terminology
Part of Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023, 2023
Part of Language Resources and Evaluation, p. 1075-1102, 2022
- DOI for A Tale of Four Parsers: Methodological Reflections on Diagnostic Evaluation and In-Depth Error Analysis for Meaning Representation Parsing
- Download full text (pdf) of A Tale of Four Parsers: Methodological Reflections on Diagnostic Evaluation and In-Depth Error Analysis for Meaning Representation Parsing
Cause and Effect in Governmental Reports: Two Data Sets for Causality Detection in Swedish
Part of Proceedings of the First Workshop on Natural Language Processing for Political Sciences (PoliticalNLP), p. 46-55, 2022
Exploring Cross-Lingual Transfer to Counteract Data Scarcity for Causality Detection
Part of WWW '22, p. 501-508, 2022
Part of Pattern Recognition Letters, p. 43-49, 2022
Identifying Cleartext in Historical Ciphers
Part of Proceedings of the Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022., 2022
Lost in Transcription of Graphic Signs in Ciphers
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022, p. 153-158, 2022
Nucleus Composition in Transition-Based Dependency Parsing
Part of Computational Linguistics, p. 849-886, 2022
Proceedings of the 5th International Conference on Historical Cryptology
2022
Quotation and Narration in Contemporary Popular Fiction in Swedish: Stylometric Explorations
Part of Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), p. 203-211, 2022
Schrödinger's tree: On syntax and neural language models
Part of Frontiers in Artificial Intelligence, 2022
SLäNDa Version 2.0: Improved and Extended Annotation of Narrative and Dialogue in Swedish Literature
Part of Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022), p. 5324-5333, 2022
Part of CLARIN, p. 561-585, Walter de Gruyter, 2022
The DECODE Database of Historical Ciphers and Keys: Version 2
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022., p. 111-114, 2022
Part of Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), p. 88-93, 2022
What Was Encoded in Historical Cipher Keys in the Early Modern Era?
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022., 2022
A Mention-Based System for Revision Requirements Detection
Part of Proceedings of the 1st Workshop on Understanding Implicit and Underspecified Language, p. 58-63, 2021
Attention Can Reflect Syntactic Structure (If You Let It)
Part of Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, p. 3031-3045, 2021
Audiobook stylistics: Comparing print and audio in the bestselling segment
Part of Journal of Cultural Analytics, p. 1-30, 2021
Deciphering Papal Ciphers from the 16th to the 18th Century
Part of Cryptologia, p. 479-540, 2021
Investigation of Transfer Languages for Parsing Latin: Italic Branch vs. Hellenic Branch
Part of Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), p. 315-320, 2021
Key Design in the Early Modern Era in Europe
Part of Proceedings of the 4th International Conference on Historical Cryptology (HistoCrypt 2021), 2021
Revealing Secrets from the Past: Studying Historical Ciphers.
2021
Revisiting Negation in Neural Machine Translation
Part of Transactions of the Association for Computational Linguistics, p. 740-755, 2021
SweLL Pseudonymization Guidelines
2021
SweLL transcription guidelines, L2 essays
2021
Contact
- info@lingfil.uu.se