Beáta Megyesi
Professor in Computational Linguistics at Department of Linguistics and Philology
- Telephone:
- +46 18 471 78 60
- E-mail:
- Beata.Megyesi@lingfil.uu.se
- Visiting address:
- Engelska parken
Thunbergsvägen 3H - Postal address:
- Box 635
751 26 UPPSALA
Download contact information for Beáta Megyesi at Department of Linguistics and Philology
- CV:
- Download CV
- ORCID:
- 0000-0002-4838-6518
Short presentation
I am a professor of computational linguistics and currently on leave from Uppsala University for a professorship at Stockholm University.
My main research area is natural language processing and digital philology. I conduct research on historical cryptology to develop methods to automatically crack historical ciphers. I also develop tools for the analysis of historical and modern texts in various genres to enable large, quantitative studies for humanities and social sciences.
Keywords
- digital humanities
- historical cryptology
- natural language processing
Biography
Education
- Professor of Computational Linguistics, Department of Linguistics and Philology, Uppsala University, 2021
- Associate Professor in Computational Linguistics, Department of Linguistics and Philology, Uppsala University, 2013
- PhD in Speech Communication, Department of Speech, Music and Hearing, KTH, 2002
- B.A. in Computational Linguistics, Department of Linguistics, Stockholm University, 2000
Appointments
Present:
- Vice chair and member of the Linguistics review panel at the Swedish Research Council, 2021-2023
- Member of the nominating committee of the Northern European Association for Language Technology – NEALT, 2022-2025
- Vice-chair and member of the board of the Center for Digital Humanities, Uppsala University, 2021-2023
Past:
- President of the Northern European Association for Language Technology – NEALT, 2020-2021
- Head of Department of Linguistics and Philology, 2009-2018
- Director of the English Park Campus, Uppsala University, 2017-2018
- Vice-president of the Northern European Association for Language Technology – NEALT (2018-2019)
- Member of the board at the Dept. of Linguistics and Philology, 2007–2009, 2010-2012, 2012-2015, 2016-2018
- Member of the board of the faculty of languages, Uppsala University, 2008-2011, 2011-2014, 2019-2020
- Director of studies at the Department of linguistics and philology, 2007-2009
- Program coordinator for the Language Technology Program, Uppsala University, 2004-2007
- Member of the board at the Department of Speech, Music and Hearing, 2003-2004
Teaching
Basic level courses
- Languages, computers, and text processing (in Swedish)
- Advisor for Language Technology Project, 7.5 ECTS
- BA thesis supervision
Advanced level courses
- Research and Development, 15 ECTS
- Digital Philology, 5/7.5 ECTS
- Thesis work in language technology, 30 ECTS
- Advisor for Language Technology Project, 7.5 ECTS
- Master thesis supervision
PhD education
- I was co-supervisor: Eva Petterson and Mojgan Seraji
Other things I like: my twins, traveling, Amnesty International, some workout like skiing, piloxing and pump, books, cello, chocolate, margaritas and cosmos, ladies of jazz, Bridges of Madison county, and of course my dearest best friends: girls, you know who you are!, and my (often empty) not-to-do list...
Things I don't like: greed, injustice, and ruling techniques
Research
Research interests
- Historical Cryptology
- Digital Philology focusing on the automatic analysis of historical texts and student writings
- PoS tagging, morphological analysis, chunking, shallow parsing for different types of languages
- Parallel corpora and treebanks
- Text categorization
Projects
- DECRYPT: Decryption of historical manuscripts (PI, Vetenskapsrådet: 2018-2024).
- DECODE: Automatic decoding of historical manuscripts (PI, Vetenskapsrådet: 2015-2017)
- SweLL - L2 infrastructure: Research Infrastructure for Swedish as a second language (RJ, 2017-2019)
- SWE-CLARIN - SWEGRAM: Automatic annotation and analysis of Swedish texts (Swedish Research Council, 2014-2018, 2019-2022)
-
- Swedish treebank
- Grammar extraction
- Basic Language Resource Kit for Swedish

Publications
Selection of publications
-
The DECODE Database of Historical Ciphers and Keys: Version 2
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022., p. 111-114, 2022
-
Lost in Transcription of Graphic Signs in Ciphers
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022, p. 153-158, 2022
-
Identifying Cleartext in Historical Ciphers
Part of Proceedings of the Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022., 2022
-
What Was Encoded in Historical Cipher Keys in the Early Modern Era?
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022., 2022
-
Proceedings of the 5th International Conference on Historical Cryptology
2022
-
Deciphering Papal Ciphers from the 16th to the 18th Century
Part of Cryptologia, p. 479-540, 2021
- DOI for Deciphering Papal Ciphers from the 16th to the 18th Century
- Download full text (pdf) of Deciphering Papal Ciphers from the 16th to the 18th Century
-
Transcription of Historical Ciphers and Keys: Guidelines, version 2.0
2021
-
Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images
Part of Proceedings of the 4th International Conference on Historical Cryptology HistoCrypt 2021, 2021
- DOI for Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images
- Download full text (pdf) of Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images
-
Part of Proceedings of the 28th International Conference on Computational Linguistics. COLING 2020, p. 357-369, 2020
-
Transcription of Historical Ciphers and Keys
Part of Proceedings of the 3rd International Conference on Historical Cryptology, p. 106-115, 2020
- DOI for Transcription of Historical Ciphers and Keys
- Download full text (pdf) of Transcription of Historical Ciphers and Keys
-
Proceedings of the 3rd International Conference on Historical Cryptology
2020
- Download full text (pdf) of Proceedings of the 3rd International Conference on Historical Cryptology
-
A Web-based Interactive Transcription Tool for Encrypted Manuscripts
Part of Proceedings of the 3rd International Conference on Historical Cryptology HistoCrypt 2020, 2020
- DOI for A Web-based Interactive Transcription Tool for Encrypted Manuscripts
- Download full text (pdf) of A Web-based Interactive Transcription Tool for Encrypted Manuscripts
-
Decryption of historical manuscripts: the DECRYPT project
Part of Cryptologia, p. 545-559, 2020
- DOI for Decryption of historical manuscripts: the DECRYPT project
- Download full text (pdf) of Decryption of historical manuscripts: the DECRYPT project
-
Proceedings of the Workshop on NLP and Pseudonymisation
2019
-
Matching Keys and Encrypted Manuscripts
Part of Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa '19), 2019
-
Pseudonymization of Language Learner Data
Part of Workshop om pseudonymisering av textdata, 2019
-
The SweLL Language Learner Corpus: From Design to Annotation
Part of Northern European Journal of Language Technology (NEJLT), p. 67-104, 2019
- DOI for The SweLL Language Learner Corpus: From Design to Annotation
- Download full text (pdf) of The SweLL Language Learner Corpus: From Design to Annotation
-
The DECODE Database: Collection of Historical Ciphers and Keys
Part of Proceedings of the 2nd International Conference on Historical Cryptology, p. 69-78, 2019
-
SWEGRAM: Annotering och analys av svenska texter
2019
-
Towards a Generic Unsupervised Method for Transcription of Encoded Manuscripts
Part of Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage, 2019
-
The HistCorp Collection of Historical Corpora and Resources
Part of DHN 2018, p. 306-320, 2018
- DOI for The HistCorp Collection of Historical Corpora and Resources
- Download full text (pdf) of The HistCorp Collection of Historical Corpora and Resources
-
Annotation of learner corpora: first SweLL insights
Part of Abstracts of SLTC 2018, p. 86-89, 2018
-
Part of Proceedings of the 7th NLP4CALL, 2018
-
Proceedings of the 1st International Conference on Historical Cryptology: HistoCrypt 2018
2018
-
Annotating Errors in Student Texts: First Experiences and Experiments
Part of Proceedings of Joint 6th NLP4CALL and 2nd NLP4LA Nodalida workshop, p. 47-60, 2017
-
SWEGRAM: A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
Part of Proceedings of the 21st Nordic Conference on Computational Linguistics, Nodalida 2017., p. 132-141, 2017
-
Transcription of Encoded Manuscripts with Image Processing Techniques
Part of Proceedings of Digital Humanities 2017., 2017
-
A Friend in Need?: Research agenda for electronic Second Language infrastructure
Part of Proceedings of SLTC 2016, 2016
-
The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis
Part of LREC 2016, p. 3192-3199, 2016
-
Proceedings of the 20th Nordic Conference of Computational Linguistics
ACL Anthology, 2015
-
A Multilingual Evaluation of Three Spelling Normalization Methods for Historical Text
Part of Proceedings of the 8th Workshop on Language Technologyfor Cultural Heritage, Social Sciences, and Humanities(LaTeCH), p. 32-41, 2014
-
Professional language in Swedish clinical text: Linguistic characterization and comparative studies
Part of Nordic Journal of Linguistics, p. 297-323, 2014
-
The Secrets of the Copiale Cipher
Part of Research into Freemasonry and Fraternalism, p. 314-324, 2011
-
Part of Proceedings of the NODALIDA 2009 workshop Nordic Perspectives on the CLARIN Infrastructure of Language Resources, p. 1-5, 2009
-
Part of Multilingualism, 2009
-
Cultivating a Swedish Treebank
Part of Resourceful Language Technology, p. 111-120, Acta Universitatis Upsaliensis, 2008
-
Language Resources and Tools for Swedish: A Survey
Part of Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), 2008
-
Single Malt or Blended? A Study in Multilingual Parser Optimization
Part of Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, p. 933-939, 2007
-
General-Purpose Text Categorization Applied to the Medical Domain.
2007
-
The Swedish-Turkish Parallel Corpus and Tools for its Creation
Part of Proceedings of NoDaLida 2007, 2007
-
A Study on Automatically Extracted Keywords in Text Categorization
Part of Proceedings of International Conference of Association for Computational Linguistics, 2006
-
Exploring the Prosody-Syntax Interface in Conversations
Part of Proceeding of the 15th International Congress of Phonetic Sciences, 2003
-
Part of Proceedings of Fonetik 2002, 2002
Recent publications
-
A Handwritten Text Recognition Dataset for Ajami Manuscripts in Fulfulde and Hausa
Part of Document Analysis and Recognition – ICDAR 2025, p. 620-637, 2025
-
Cipher key instructions in early modern Europe: analysis and text edition
Part of Cryptologia, p. 416-442, 2025
- DOI for Cipher key instructions in early modern Europe: analysis and text edition
- Download full text (pdf) of Cipher key instructions in early modern Europe: analysis and text edition
-
Keys with nomenclatures in the early modern Europe
Part of Cryptologia, p. 97-139, 2024
- DOI for Keys with nomenclatures in the early modern Europe
- Download full text (pdf) of Keys with nomenclatures in the early modern Europe
-
Towards Data-effective Educational Question Generation with Prompt-based Learning
Part of Intelligent Computing, p. 161-174, 2023
-
Historical Language Models in Cryptanalysis: Case Studies on English and German
Part of Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023, 2023
- DOI for Historical Language Models in Cryptanalysis: Case Studies on English and German
- Download full text (pdf) of Historical Language Models in Cryptanalysis: Case Studies on English and German
All publications
Articles in journal
-
Cipher key instructions in early modern Europe: analysis and text edition
Part of Cryptologia, p. 416-442, 2025
- DOI for Cipher key instructions in early modern Europe: analysis and text edition
- Download full text (pdf) of Cipher key instructions in early modern Europe: analysis and text edition
-
Keys with nomenclatures in the early modern Europe
Part of Cryptologia, p. 97-139, 2024
- DOI for Keys with nomenclatures in the early modern Europe
- Download full text (pdf) of Keys with nomenclatures in the early modern Europe
-
Part of Pattern Recognition Letters, p. 43-49, 2022
- DOI for Few shots are all you need: A progressive learning approach for low resource handwritten text recognition
- Download full text (pdf) of Few shots are all you need: A progressive learning approach for low resource handwritten text recognition
-
Deciphering Papal Ciphers from the 16th to the 18th Century
Part of Cryptologia, p. 479-540, 2021
- DOI for Deciphering Papal Ciphers from the 16th to the 18th Century
- Download full text (pdf) of Deciphering Papal Ciphers from the 16th to the 18th Century
-
Decryption of historical manuscripts: the DECRYPT project
Part of Cryptologia, p. 545-559, 2020
- DOI for Decryption of historical manuscripts: the DECRYPT project
- Download full text (pdf) of Decryption of historical manuscripts: the DECRYPT project
-
The SweLL Language Learner Corpus: From Design to Annotation
Part of Northern European Journal of Language Technology (NEJLT), p. 67-104, 2019
- DOI for The SweLL Language Learner Corpus: From Design to Annotation
- Download full text (pdf) of The SweLL Language Learner Corpus: From Design to Annotation
-
Parallel corpora and Universal Dependencies for Turkic
Part of Turkic languages, p. 259-273, 2015
-
Professional language in Swedish clinical text: Linguistic characterization and comparative studies
Part of Nordic Journal of Linguistics, p. 297-323, 2014
-
Bootstrapping a Persian Dependency Treebank
Part of Linguistic Issues in Language Technology, 2012
-
The Secrets of the Copiale Cipher
Part of Research into Freemasonry and Fraternalism, p. 314-324, 2011
-
Shallow Parsing with PoS Taggers and Linguistic Features.
Part of Journal of Machine Learning Research: Special Issue on Shallow Parsing, p. 639-668, 2002
Chapters in book
-
Supporting Research Environment for Less Explored Languages: A Case Study of Swedish and Turkish
Part of Resourceful Language Technology, p. 96-110, Uppsala universitet, 2008
-
Cultivating a Swedish Treebank
Part of Resourceful Language Technology. A Festschrift in Honor of Anna Sågvall Hein, p. 111-120, Acta Universitatis Upsaliensis, 2008
-
Cultivating a Swedish Treebank
Part of Resourceful Language Technology, p. 111-120, Acta Universitatis Upsaliensis, 2008
Collections (editor)
-
Proceedings of the 20th Nordic Conference of Computational Linguistics
ACL Anthology, 2015
-
Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein
Acta Universitatis Upsaliensis, 2008
Conference papers
-
A Handwritten Text Recognition Dataset for Ajami Manuscripts in Fulfulde and Hausa
Part of Document Analysis and Recognition – ICDAR 2025, p. 620-637, 2025
-
Towards Data-effective Educational Question Generation with Prompt-based Learning
Part of Intelligent Computing, p. 161-174, 2023
-
Historical Language Models in Cryptanalysis: Case Studies on English and German
Part of Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023, 2023
- DOI for Historical Language Models in Cryptanalysis: Case Studies on English and German
- Download full text (pdf) of Historical Language Models in Cryptanalysis: Case Studies on English and German
-
What is the Code for the Code? Historical Cryptology Terminology
Part of Proceedings of the 6th International Conference on Historical Cryptology HistoCrypt 2023, 2023
- DOI for What is the Code for the Code? Historical Cryptology Terminology
- Download full text (pdf) of What is the Code for the Code? Historical Cryptology Terminology
-
The DECODE Database of Historical Ciphers and Keys: Version 2
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022., p. 111-114, 2022
-
Lost in Transcription of Graphic Signs in Ciphers
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022, p. 153-158, 2022
-
Identifying Cleartext in Historical Ciphers
Part of Proceedings of the Workshop on Language Technologies for Historical and Ancient Languages. LT4HALA 2022., 2022
-
What Was Encoded in Historical Cipher Keys in the Early Modern Era?
Part of Proceedings of the 5th International Conference on Historical Cryptology. HistoCrypt 2022., 2022
-
Key Design in the Early Modern Era in Europe
Part of Proceedings of the 4th International Conference on Historical Cryptology (HistoCrypt 2021), 2021
- DOI for Key Design in the Early Modern Era in Europe
- Download full text (pdf) of Key Design in the Early Modern Era in Europe
-
Revealing Secrets from the Past: Studying Historical Ciphers.
2021
-
Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images
Part of Proceedings of the 4th International Conference on Historical Cryptology HistoCrypt 2021, 2021
- DOI for Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images
- Download full text (pdf) of Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images
-
Part of Proceedings of the 28th International Conference on Computational Linguistics. COLING 2020, p. 357-369, 2020
-
Automatic Key Structure Extraction
Part of Proceedings of the 3rd International Conference on Historical Cryptology, p. 146-152, 2020
- DOI for Automatic Key Structure Extraction
- Download full text (pdf) of Automatic Key Structure Extraction
-
Transcription of Historical Ciphers and Keys
Part of Proceedings of the 3rd International Conference on Historical Cryptology, p. 106-115, 2020
- DOI for Transcription of Historical Ciphers and Keys
- Download full text (pdf) of Transcription of Historical Ciphers and Keys
-
A Web-based Interactive Transcription Tool for Encrypted Manuscripts
Part of Proceedings of the 3rd International Conference on Historical Cryptology HistoCrypt 2020, 2020
- DOI for A Web-based Interactive Transcription Tool for Encrypted Manuscripts
- Download full text (pdf) of A Web-based Interactive Transcription Tool for Encrypted Manuscripts
-
Matching Keys and Encrypted Manuscripts
Part of Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa '19), 2019
-
Pseudonymization of Language Learner Data
Part of Workshop om pseudonymisering av textdata, 2019
-
The DECODE Database: Collection of Historical Ciphers and Keys
Part of Proceedings of the 2nd International Conference on Historical Cryptology, p. 69-78, 2019
-
Towards a Generic Unsupervised Method for Transcription of Encoded Manuscripts
Part of Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage, 2019
-
The HistCorp Collection of Historical Corpora and Resources
Part of DHN 2018, p. 306-320, 2018
- DOI for The HistCorp Collection of Historical Corpora and Resources
- Download full text (pdf) of The HistCorp Collection of Historical Corpora and Resources
-
Annotation of learner corpora: first SweLL insights
Part of Abstracts of SLTC 2018, p. 86-89, 2018
-
Part of Proceedings of the 7th NLP4CALL, 2018
-
Annotating Errors in Student Texts: First Experiences and Experiments
Part of Proceedings of Joint 6th NLP4CALL and 2nd NLP4LA Nodalida workshop, p. 47-60, 2017
-
SWEGRAM: A Web-Based Tool for Automatic Annotation and Analysis of Swedish Texts
Part of Proceedings of the 21st Nordic Conference on Computational Linguistics, Nodalida 2017., p. 132-141, 2017
-
Transcription of Encoded Manuscripts with Image Processing Techniques
Part of Proceedings of Digital Humanities 2017., 2017
-
Swe-Clarin: Language Resources and Technology for Digital Humanities
Part of Digital Humanities 2016, p. 29-51, 2016
-
A Friend in Need?: Research agenda for electronic Second Language infrastructure
Part of Proceedings of SLTC 2016, 2016
-
The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis
Part of LREC 2016, p. 3192-3199, 2016
-
Ranking Relevant Verb Phrases Extracted from Historical Text
Part of Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 2015
-
A Multilingual Evaluation of Three Spelling Normalization Methods for Historical Text
Part of Proceedings of the 8th Workshop on Language Technologyfor Cultural Heritage, Social Sciences, and Humanities(LaTeCH), p. 32-41, 2014
-
Verb Phrase Extraction in a Historical Context
2014
-
Automatic Morphosyntactic Analaysis of Clinical Text
2014
-
EACL - Expansion of Abbreviations in CLinical text
Part of Workshop on Predicting and Improving Text Readability for Target Reader Populations, PITR 2014, 2014
-
A Multilingual Evaluation of Three Spelling Normalization Methods for Historical Text.
Part of Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities, LaTeCH 2014, 2014
-
Part of Proceedings of the 19th Nordic Conference on Computational Linguistics, 2013
-
An SMT Approach to Automatic Annotation of Historical Texts
Part of Workshop on Computational Historical Linguistics, Nodalida 2013., 2013
-
A Basic Language Resource Kit for Persian
Part of Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), p. 2245-2252, 2012
-
Rule-Based Normalisation of Historical Text – a Diachronic Study
Part of Empirical Methods in Natural Language Processing, p. 333-341, 2012
-
Parsing the Past - Identification of Verb Constructions in Historical Text
Part of Language Technology for Cultural Heritage, Social Sciences, and Humanities, 2012
-
Dependency Parsers for Persian
Part of Proceedings of 10th Workshop on Asian Language Resources, COLING 2012, 24th International Conference on Computational Linguistics, Mumbai, India, 2012
-
2011
-
Using Parallel Corpora in Data-Driven Teaching of Turkish in Sweden.
p. 1686-1689, 2010
-
The English-Swedish-Turkish Parallel Treebank
Part of Proceedings of Language Resources and Evaluation (LREC 2010), 2010
-
Part of Proceedings of the NODALIDA 2009 workshop Nordic Perspectives on the CLARIN Infrastructure of Language Resources, p. 1-5, 2009
-
The Open Source Tagger HunPoS for Swedish.
Part of Proceedings of the 17th Nordic Conference on Computational Linguistics (NODALIDA), 2009
-
Part of Multilingualism, 2009
-
Swedish-Turkish Parallel Treebank
Part of Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), 2008
-
Language Resources and Tools for Swedish: A Survey
Part of Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), 2008
-
Bootstrapping a Swedish Treebank Using Cross-Corpus Harmonization and Annotation Projection
Part of Proceedings of the 6th International Workshop on Treebanks and Linguistic Theories, p. 97-102, 2007
-
Single Malt or Blended? A Study in Multilingual Parser Optimization
Part of Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, p. 933-939, 2007
-
The Swedish-Turkish Parallel Corpus and Tools for its Creation
Part of Proceedings of NoDaLida 2007, 2007
-
Single Malt or Blended? A Study in Multilingual Parser Optimization.
Part of Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, 2007
-
Bootstrapping a Swedish Treebank Using Cross-Corpus Harmonization and Annotation Projection.
Part of Proceedings of Treebanks and Linguistic Theories, 2007
-
A Study on Automatically Extracted Keywords in Text Categorization
Part of Proceedings of International Conference of Association for Computational Linguistics, 2006
-
Building a Swedish-Turkish Parallel Corpus
Part of Proceedings of Language Resources and Evaluation Conference, 2006
-
Using Linguistic Data for Genre Classification
Part of Proceedings of SAIS-SSLS, 2005
-
The Acoustic and Morpho-Syntactic Context of Prosodic Boundaries in Dialogs.
Part of Proceedings of Fonetik 2003, 2003
-
Exploring the Prosody-Syntax Interface in Conversations
Part of Proceeding of the 15th International Congress of Phonetic Sciences, 2003
-
Silence and Discourse Context in Read Speech and Dialogues in Swedish
Part of Proceedings of the Speech Prosody 2002 conference, p. 363-366, 2002
-
Part of Proceedings of Fonetik 2002, 2002
-
Part of Proceedings of ICSLP'2002 - 7th International Conference on Spoken Language Processing, 2002
-
Data-Driven Methods for Building a Swedish Treebank.
Part of Swedish Treebank Symposium, 2002
-
Pausing in Dialogues and Read Speech: Speaker's Production and Listeners Interpretation
Part of Proceedings of the Workshop on Prosody in Speech Recognition and Understanding, 2001
-
Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish
Part of Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2001), 2001
-
A Comparative Study of Pauses in Dialogues and Read Speech.
Part of Proceedings of Eurospeech 2001, p. 931-935, 2001
-
Data-Driven Methods for PoS tagging and Chunking of Swedish
Part of In the Proceedings of the Nordic Conference on Computational Linguistics, Nodalida 2001, 2001
-
Phrasal Parsing by Using Data-Driven PoS Taggers
Part of Proceedings of the Conference on Recent Advances in Natural Language Processing, p. 166-173, 2001
-
Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora
Part of Proceedings of the Third International Workshop on TEXT, SPEECH and DIALOGUE, p. 27-32, 2000
-
Towards a Finite-State Parser for Swedish
Part of Proceedings of NoDaLiDa 99, p. 115-123, 2000
-
Improving Brill's PoS Tagger for an Agglutinative Language
Part of Proceedings of the Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, p. 275-284, 1999
-
Brill's PoS Tagger with Extended Lexical Templates for Hungarian
Part of Proceedings of the Workshop (W01) on Machine Learning in Human Language Technology, p. 22-28, 1999
Conference proceedings (editor)
-
2023
-
Proceedings of the 5th International Conference on Historical Cryptology
2022
-
Proceedings of the 3rd International Conference on Historical Cryptology
2020
- Download full text (pdf) of Proceedings of the 3rd International Conference on Historical Cryptology
-
Proceedings of the Workshop on NLP and Pseudonymisation
2019
-
Proceedings of the 1st International Conference on Historical Cryptology: HistoCrypt 2018
2018
Reports
-
SweLL transcription guidelines, L2 essays
2021
-
SweLL Pseudonymization Guidelines
2021
-
Transcription of Historical Ciphers and Keys: Guidelines, version 2.0
2021
-
Transcription of Historical Ciphers and Keys: Guidelines
2020
-
SWEGRAM: Annotering och analys av svenska texter
2019
-
The Open Source Tagger HunPoS for Swedish
2008
-
Survey on Swedish Language Resources
2008
-
Supporting Research Environment for Swedish and Turkish
2008
-
Converting SUC2.0 to XCES with stand-off annotation
2007
-
Changing the tokenization in Talbanken to SUC2.0
2007
-
General-Purpose Text Categorization Applied to the Medical Domain.
2007