STATISTICS Review Seminar: Väinö Yrjänäinen
- Date
- 11 February 2026, 10:15–11:30
- Location
- Ekonomikum, H317
- Type
- Seminar
- Organiser
- Department of Statistics Uppsala University
Speaker Väinö Yrjänäinen, Department of Statistics Uppsala University
Opponent Pierre Nyquist, Department of Mathematical Sciences, Chalmers University of Technology and Gothenburg University
Abstract Data accuracy is crucial for reliable research, accurate decision-making, and high-performance machine learning. However, maintaining data accuracy, especially at scale, is complicated and difficult to verify. By integrating concepts from software engineering, statistical quality control, and branching process theory, we formalize an iterative data curation framework that scales well to large data sets. We go on to prove that the proposed approach asymptotically eliminates all errors in the data with probability one. Additionally, we provide theoretical guarantees that data accuracy tests speed up error reduction. We corroborate these results through simulations on text and tabular data, and a real-world application to the Swedish Parliamentary Corpus, demonstrating the framework’s effectiveness in preserving high-accuracy historical records at scale.