Beáta Megyesi
Professor i datorlingvistik vid Institutionen för lingvistik och filologi
- Telefon:
- 018-471 78 60
- E-post:
- Beata.Megyesi@lingfil.uu.se
- Besöksadress:
- Engelska parken
Thunbergsvägen 3H - Postadress:
- Box 635
751 26 UPPSALA
Ladda ned kontaktuppgifter för Beáta Megyesi vid Institutionen för lingvistik och filologi
- Akademiska meriter:
- FD, docent
- CV:
- Ladda ned CV
- ORCID:
- 0000-0002-4838-6518
Mer information visas för dig som medarbetare om du loggar in.
Kort presentation
Jag är professor i datorlingvistik och är för närvarande tjänstledig från Uppsala universitet.
Jag är intresserad av automatisk bearbetning och analys av naturliga språk med särskild inriktning mot digital humaniora/filologi. Jag bedriver forskning i historisk kryptologi för att utveckla metoder för att automatiskt knäcka hemligt kodade dokument, s.k. chiffer. Jag utvecklar också verktyg som möjliggör humanister och samhällsvetare att få kvantitativa analyser av sina texter.
Nyckelord
- digital humanities
- historical cryptology
- natural language processing
Biografi
Utbildning
- Professor i datorlingvistik, Institutionen för lingvistik och filologi, Uppsala universitet, 2021
- Docent i datorlingvistik, Institutionen för lingvistik och filologi, Uppsala universitet, 2013
- Fil. dr. i talkommunikation, Institutionen för Tal, musik och hörsel, Kungliga Tekniska Högskolan (KTH), 2002
- Fil. kand. i datorlingvistik, Institutionen för lingvistik, Stockholms universitet, 2000
Uppdrag
Nuvarande:
- Ledamot i Vetenskapsrådets beredningsgrupp för Språkvetenskap, 2021-2023
- Vice-ordförande och ledamot i Centrum för Digital Humaniora, 2021-2023
- Nomineringskommitté för "Northern European Association for Language Technology" –
NEALT, 2022-2025
Tidigare:
- President för "Northern European Association for Language Technology" NEALT, 2020-2021
- Prefekt, Inst. för lingvistik och filologi, Uppsala universitet, 2009-2018
- Föreståndare för Engelska parkens campus, Uppsala universitet, 2017-2018
- Vicepresident för Northern European Association for Language Technology
NEALT, 2018-2019 - Lärarrepresentant i institutionsstyrelsen för Inst. för lingvistik och filologi, Uppsala universitet, 2007–2009, 2010-2012, 2012-2015, 2016-2018, 2022-2024
- Ledamot i Språkvetenskapliga fakultetsnämnden, Uppsala universitet, 2008-2011, 2011-2014, 2019-2020
- Studierektor på grund- och avancerad nivå, Inst. för lingvistik och filologi, Uppsala universitet, 2007-2009
- Programsamordnare för Språkteknologiprogrammet, Inst. för lingvistik och filologi, Uppsala universitet 2004-2007
- Representant i institutionsstyrelsen på Tal, musik och hörsel, KTH, 2003-2004
Undervisning
Grundnivå
- Språk, datorer och textbehandling, 7,5 hp: (2011-2020)
- Handledare till kursen Projektarbete i språkteknologi, 7,5 hp: (2011-2019)
- Uppsatshandledning
Avancerad nivå
- Forskning och utveckling, 15 hp (2021-2022)
- Digital filologi, 5 samt 7.5 hp (2018-2023)
- Examensarbeten i språkteknologi, 30 hp
- Handledare på kursen Projekt i språkteknologi, 7,5 hp: 2011-2015
Forskarutbildning
- Jag har varit bihandledare till Eva Pettersson och Mojgan Seraji
Mentorsnätverket:
Jag har varit medlem i mentorsnätverket sedan 2006. Pedagogiska frågor har alltid engagerat mig och jag vill gärna bidra med att hjälpa och stödja yngre lärare i sin lärarroll från smått till stort utifrån enskilda behov. De pedagogiska undervisningsformer som jag använder mest är traditionella föreläsningar, seminarier, labbar och handledning av projekt och uppsatser. Jag undervisar både fristående- och programkurser på Språkteknologiprogrammet på kandidat- och masternivå samt på Språkvetarprogrammet.
Annat jag gillar: tvillingar, resor till fjärran länder, Amnesty International, böcker, cello, lite motion som skidåkning, piloxing och pump, choklad, margaritas och cosmos, ladies of jazz, Broarna i Madison county, mina bästa väninnor som står ut med mig år efter år... och min (ofta tomma) not-to-do lista...
Jag ogillar: girighet, orättvisor och härskartekniker
Forskning
Forskningsintressen
- Historisk kryptologi
- Digital filologi med fokus på automatisk analys av historiska texter och elevtexter
- Ordklasstaggning, morfologisk analys, chunkning, ytsyntaktisk parsning för olika språk
- Parallella korpusar och trädbanker
- Textkategorisering
Projekt jag medverkar/medverkat i:
- DECRYPT: Dekryptering av historiska manuskript (PI, Vetenskapsrådet: 2018-2024).
- DECODE: Automatisk avkodning av historiska manuskript (PI, Vetenskapsrådet: 2015-2017)
- SweLL - L2 infrastruktur: Forskningsinfrastruktur för svenska som andraspråk (RJ, 2017-2019)
- SWE-CLARIN: SWEGRAM: Automatisk annotering och analys av texter på svenska (Vetenskapsrådet, 2014-2018, 2019-2023)
- Flerspråkig parallellkorpus
- Svensk trädbank
- Grammatikextraktion
- Basresurser för svensk språkteknologi
Publikationer
Urval av publikationer
- Proceedings of the 5th International Conference on Historical Cryptology (2022)
- Identifying Cleartext in Historical Ciphers (2022)
- The DECODE Database of Historical Ciphers and Keys: Version 2 (2022)
- Lost in Transcription of Graphic Signs in Ciphers (2022)
- What Was Encoded in Historical Cipher Keys in the Early Modern Era? (2022)
- Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images (2021)
- Deciphering Papal Ciphers from the 16th to the 18th Century (2021)
- Transcription of Historical Ciphers and Keys (2021)
- A Web-based Interactive Transcription Tool for Encrypted Manuscripts (2020)
- Transcription of Historical Ciphers and Keys (2020)
- Proceedings of the 3rd International Conference on Historical Cryptology (2020)
- Decryption of historical manuscripts (2020)
- Towards Privacy by Design in Learner Corpora Research: A Case of On-the-fly Pseudonymization of Swedish Learner Essays (2020)
- Proceedings of the Workshop on NLP and Pseudonymisation (2019)
- Towards a Generic Unsupervised Method for Transcription of Encoded Manuscripts (2019)
- The DECODE Database: Collection of Historical Ciphers and Keys (2019)
- SWEGRAM: Annotering och analys av svenska texter (2019)
- Pseudonymization of Language Learner Data (2019)
- Matching Keys and Encrypted Manuscripts (2019)
- The SweLL Language Learner Corpus: From Design to Annotation (2019)
- Proceedings of the 1st International Conference on Historical Cryptology (2018)
- Learner Corpus Anonymization in the Age of GDPR (2018)
- The HistCorp Collection of Historical Corpora and Resources (2018)
- Annotation of learner corpora (2018)
- Transcription of Encoded Manuscripts with Image Processing Techniques (2017)
- SWEGRAM (2017)
- Annotating Errors in Student Texts (2017)
- The Uppsala Corpus of Student Writings (2016)
- A Friend in Need? (2016)
- Proceedings of the 20th Nordic Conference of Computational Linguistics (2015)
- A Multilingual Evaluation of Three Spelling Normalization Methods for Historical Text (2014)
- Professional language in Swedish clinical text (2014)
- The Secrets of the Copiale Cipher (2011)
- Swedish CLARIN Activities (2009)
- Using Parallel Corpora in Teaching and Research (2009)
- Language Resources and Tools for Swedish: A Survey (2008)
- Cultivating a Swedish Treebank (2008)
- General-Purpose Text Categorization Applied to the Medical Domain. (2007)
- The Swedish-Turkish Parallel Corpus and Tools for its Creation (2007)
- Single Malt or Blended? A Study in Multilingual Parser Optimization (2007)
- A Study on Automatically Extracted Keywords in Text Categorization (2006)
- Exploring the Prosody-Syntax Interface in Conversations (2003)
- Boundaries and groupings - the structuring of speech in different communicative situations: a description of the GROG project (2002)
Senaste publikationer
- Keys with nomenclatures in the early modern Europe (2024)
- Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023) (2023)
- Historical Language Models in Cryptanalysis: Case Studies on English and German (2023)
- What is the Code for the Code?Historical Cryptology Terminology (2023)
- Towards Data-effective Educational Question Generation with Prompt-based Learning (2023)
Alla publikationer
Artiklar
- Keys with nomenclatures in the early modern Europe (2024)
- Few shots are all you need (2022)
- Deciphering Papal Ciphers from the 16th to the 18th Century (2021)
- Decryption of historical manuscripts (2020)
- The SweLL Language Learner Corpus: From Design to Annotation (2019)
- Parallel corpora and Universal Dependencies for Turkic (2015)
- Professional language in Swedish clinical text (2014)
- Bootstrapping a Persian Dependency Treebank (2012)
- The Secrets of the Copiale Cipher (2011)
- Shallow Parsing with PoS Taggers and Linguistic Features. (2002)
Böcker
- Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023) (2023)
- Proceedings of the 5th International Conference on Historical Cryptology (2022)
- Proceedings of the 3rd International Conference on Historical Cryptology (2020)
- Proceedings of the Workshop on NLP and Pseudonymisation (2019)
- Proceedings of the 1st International Conference on Historical Cryptology (2018)
- Proceedings of the 20th Nordic Conference of Computational Linguistics (2015)
- Resourceful Language Technology (2008)
Kapitel
- Supporting Research Environment for Less Explored Languages (2008)
- Cultivating a Swedish Treebank (2008)
- Cultivating a Swedish Treebank (2008)
Konferenser
- Historical Language Models in Cryptanalysis: Case Studies on English and German (2023)
- What is the Code for the Code?Historical Cryptology Terminology (2023)
- Towards Data-effective Educational Question Generation with Prompt-based Learning (2023)
- Identifying Cleartext in Historical Ciphers (2022)
- The DECODE Database of Historical Ciphers and Keys: Version 2 (2022)
- Lost in Transcription of Graphic Signs in Ciphers (2022)
- What Was Encoded in Historical Cipher Keys in the Early Modern Era? (2022)
- Unsupervised Alphabet Matching in Historical Encrypted Manuscript Images (2021)
- Revealing Secrets from the Past: Studying Historical Ciphers. (2021)
- Key Design in the Early Modern Era in Europe (2021)
- A Web-based Interactive Transcription Tool for Encrypted Manuscripts (2020)
- Transcription of Historical Ciphers and Keys (2020)
- Automatic Key Structure Extraction (2020)
- Towards Privacy by Design in Learner Corpora Research: A Case of On-the-fly Pseudonymization of Swedish Learner Essays (2020)
- Towards a Generic Unsupervised Method for Transcription of Encoded Manuscripts (2019)
- The DECODE Database: Collection of Historical Ciphers and Keys (2019)
- Pseudonymization of Language Learner Data (2019)
- Matching Keys and Encrypted Manuscripts (2019)
- Learner Corpus Anonymization in the Age of GDPR (2018)
- The HistCorp Collection of Historical Corpora and Resources (2018)
- Annotation of learner corpora (2018)
- Transcription of Encoded Manuscripts with Image Processing Techniques (2017)
- SWEGRAM (2017)
- Annotating Errors in Student Texts (2017)
- Swe-Clarin (2016)
- The Uppsala Corpus of Student Writings (2016)
- A Friend in Need? (2016)
- Ranking Relevant Verb Phrases Extracted from Historical Text (2015)
- Automatic Morphosyntactic Analaysis of Clinical Text (2014)
- Verb Phrase Extraction in a Historical Context (2014)
- A Multilingual Evaluation of Three Spelling Normalization Methods for Historical Text. (2014)
- A Multilingual Evaluation of Three Spelling Normalization Methods for Historical Text (2014)
- EACL - Expansion of Abbreviations in CLinical text (2014)
- Normalization of historical Text Using Context-Sensitive Weighted Levenshtein Distance and Compound Splitting (2013)
- An SMT Approach to Automatic Annotation of Historical Texts (2013)
- Parsing the Past - Identification of Verb Constructions in Historical Text (2012)
- Rule-Based Normalisation of Historical Text – a Diachronic Study (2012)
- A Basic Language Resource Kit for Persian (2012)
- Dependency Parsers for Persian (2012)
- The Copiale Cipher (2011)
- Using Parallel Corpora in Data-Driven Teaching of Turkish in Sweden. (2010)
- The English-Swedish-Turkish Parallel Treebank (2010)
- Swedish CLARIN Activities (2009)
- The Open Source Tagger HunPoS for Swedish. (2009)
- Using Parallel Corpora in Teaching and Research (2009)
- Language Resources and Tools for Swedish: A Survey (2008)
- Swedish-Turkish Parallel Treebank (2008)
- Single Malt or Blended? A Study in Multilingual Parser Optimization. (2007)
- The Swedish-Turkish Parallel Corpus and Tools for its Creation (2007)
- Single Malt or Blended? A Study in Multilingual Parser Optimization (2007)
- Bootstrapping a Swedish Treebank Using Cross-Corpus Harmonization and Annotation Projection (2007)
- Bootstrapping a Swedish Treebank Using Cross-Corpus Harmonization and Annotation Projection. (2007)
- A Study on Automatically Extracted Keywords in Text Categorization (2006)
- Building a Swedish-Turkish Parallel Corpus (2006)
- Using Linguistic Data for Genre Classification (2005)
- Exploring the Prosody-Syntax Interface in Conversations (2003)
- The Acoustic and Morpho-Syntactic Context of Prosodic Boundaries in Dialogs. (2003)
- Boundaries and groupings - the structuring of speech in different communicative situations: a description of the GROG project (2002)
- Silence and Discourse Context in Read Speech and Dialogues in Swedish (2002)
- Production and Perception of Pauses and their Linguistic Context in Read and Spontaneous Speech in Swedish. (2002)
- Data-Driven Methods for Building a Swedish Treebank. (2002)
- A Comparative Study of Pauses in Dialogues and Read Speech. (2001)
- Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish (2001)
- Data-Driven Methods for PoS tagging and Chunking of Swedish (2001)
- Phrasal Parsing by Using Data-Driven PoS Taggers (2001)
- Pausing in Dialogues and Read Speech: Speaker's Production and Listeners Interpretation (2001)
- Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora (2000)
- Towards a Finite-State Parser for Swedish (2000)
- Improving Brill's PoS Tagger for an Agglutinative Language (1999)
- Brill's PoS Tagger with Extended Lexical Templates for Hungarian (1999)
Rapporter
- SweLL Pseudonymization Guidelines (2021)
- Transcription of Historical Ciphers and Keys (2021)
- SweLL transcription guidelines, L2 essays (2021)
- Transcription of Historical Ciphers and Keys (2020)
- SWEGRAM: Annotering och analys av svenska texter (2019)
- Survey on Swedish Language Resources (2008)
- The Open Source Tagger HunPoS for Swedish (2008)
- Supporting Research Environment for Swedish and Turkish (2008)
- General-Purpose Text Categorization Applied to the Medical Domain. (2007)
- Changing the tokenization in Talbanken to SUC2.0 (2007)
- Converting SUC2.0 to XCES with stand-off annotation (2007)