Tatjana Pavlenko promoted to Professor of Statistics at Uppsala University
The Deputy vice-chancellor of Uppsala University has promoted Tatjana Pavlenko to Professor of Statistics from 1 October 2021.
Large-scale statistical learning
(Image removed) Tatjana Pavlenko, Professor at the Department
of Statistics
Tatjana Pavlenko’s research interests lie in the broad and active area of large-scale statistical learning which refers to a body of statistical and computational methodologies for discovering important features of input data. The term “large-scale” refers to volumes of massive and complex information possibly collected from heterogeneous sources where the number of automatically measured features can be far greater than the number of observational units.
Application domains are very wide and include astrophysical source detection, signal processing and biomedical research such as functional genomics and proteomics, drug development, and forensics, just a few to name. In such type of applications, largescale datasets are nowadays generated at much lower costs due to advancements in computational technology and increase of computing power, and statistical methodology struggles, most often successfully, to keep up with the rapidly growing rate of scientific data production.
Understand what data says
As statisticians, our major goal is to gain valuable knowledge from these types of data, that is to capture predictive structural patterns and important trends, i.e. to understand what data says. That is what we call for statistical learning, i.e. learning from data, says Tatjana Pavlenko.
Biomedical applications of methodological findings
As many statisticians, I have a theoretical, and rather mathematical background, but biomedical applications of methodological findings have always attracted my interests. Current medical genetic studies show that many diseases such as cancer, multiple sclerosis and diabetes are complex systems whose behavior can be studied using statistical modeling in high dimensions, she says.
Revolutionary biomedical devices such as microarrays and microbial genome sequencing make it possible to assess the individual activity for thousands of genes at once, however the problem of surveying so many features simultaneously is still statistically challenging due to the high-dimensionality phenomena, which renders conventional multivariate inferential methods inappropriate, explains Tatjana Pavlenko.
Methodological research conducted by Tatjana Pavlenko and her colleagues addresses this challenge. One of our resent developments is the novel, supervised learning-based framework for high-dimensional classification and prediction problems, which is developed jointly with Associate Professor Rauf Ahmad from the Department of Statistics, Uppsala University, says Tatjana Pavlenko. In particular, we have designed an efficient classification procedure which, besides a number of interesting theoretical properties, can scale very high-dimensional datasets on a resource-limited device and does not require any data pre-processing such as dimensionality reduction algorithms. Real-life applications in the context of automated discriminating between the different types of lymphoid malignancy demonstrated a very high performance accuracy when using more than 5000 gene expressions and only 77 patients.
In addition to the high dimensionality issue, the relevant effects are often sparse: out of a huge number of automatically measured features most of them are merely containing noise and only a small fraction contains signals, or information of interest. The sparsity of relevant information hidden in a high-dimensional dataset is a very special setting which poses a needle-in-a-haystack detection problem, says Tatjana Pavlenko. To tackle this problem, we jointly with Professor Natalia Stepanova from Carlton University, Canada, have developed some theoretical ideas for detecting certain types of sparse and weak effects in high dimensions, with which we have seen fascinating new behaviours.
Our results not only shed light on the general problem of detection of sparse and weak effects but also contribute to the development of practical statistical learning methodologies which combine high-dimensional statistical inference with algorithmic thinking. Jointly with Fredrik Boulund and Justine Debelius from the Center for Translational Microbiome Research at Karolinska Institutet, and Annika Tillander, from the Division of Statistics and Machine Learning at Linköping University we launch the sparse-weak signal detection algorithms to analyse differently abundant features in the human gut microbiome, explains Tatjana Pavlenko.
Artificial Intelligence (AI) and machine learning
Uppsala University gives me an opportunity to continue to work with my ideas in the frame of AI4Research, an exiting project which focuses on strengthening, renewing and developing research in Artificial Intelligence and machine learning. Due to its cross-disciplinary nature, this project is very rewarding and allows me to work with both theory and practice, and possibly identify new areas where my research results can be applied, says Tatjana Pavlenko.
Tatjana Pavlenko, professor in Statistics at the Department of Statistics, Uppsala University.