Statistics seminars
The Department of Statistics organises weekly seminars during working periods of every semester. The seminars take place in seminar room, H317, on Wednesdays, starting 10:15. Interested participants are welcome.
The seminars cover all research areas actively pursued at the department and both in theory and applications. For the major part of the seminar series national and international speakers are invited with the aim to connect the department with colleagues across Sweden and abroad that work in areas of mutual interest. The reset of the series in each semester is reserved for doctoral students who present the latest findings of their research.
For questions and more information about the UU Statistics seminars, please contact Yukai Yang.
Seminars 2024
Autumn
Review Seminar 2024-09-04: Transformer assisted survey sampling for efficient finite population statistics in highly imbalanced textual data: public hate crime estimation
Speaker: Hannes Waldetoft, Department of Statistics, Uppsala University
Time and place: 2024-09-04, at 10:15 - 11:45, Ekonomikum, Room H317
Abstract
Estimating population parameters in finite populations of text documents can be challenging in cases where obtaining the labels for the target variable requires manual annotation. To address this problem, we combine predictions from a transformer encoder neural network with well-established survey sampling estimators. This is done by training a classifier and then using the model predictions as an auxiliary variable in the estimators. The applicability is demonstrated on Swedish hate crime statistics, which are based on Swedish police reports, for which approximately 1.5 million are being filed annually. Estimates of the yearly number of hate crimes, the police's under-reporting, and proportions of specific hate crime types are derived using the Hansen-Hurwits estimator, regression estimation, and stratified random sampling. We conclude that if labeled training data is available, the proposed method can provide efficient estimates with reduced time spent on manual annotation.