CIM Seminar with Camille Clouard
- Date: 26 April 2022, 12:15–13:00
- Location: Ångström Laboratory, Å4004
- Type: Seminar
- Lecturer: Camille Clouard
- Contact person: Oskar Tegby
Title: Probablistic decoding methods of pooled experiments for SNP genotype imputation
Abstract
The information conveyed by genetic markers as Single Nucleotide Polymorphisms (SNPs) has been widely used in biomedical research for studying human diseases, but also increasingly in agriculture by plant and animal breeders for selection purposes.
Specific identified markers can act as a genetic signature that is correlated to certain characteristics in a living organism, e.g. a sensitivity to a disease or high-yield traits.
Capturing these signatures with sufficient statistical power often requires large volumes of data, both in terms of the number of samples to analyze (thousands) and in the number of genetic markers to screen (up to millions). Establishing statistical significance for effects from genetic variations is especially delicate when they occur at low frequencies.
The production cost of marker data (genotype data) is therefore a critical part of the analysis. It can still be prohibitive, despite recent technological advances. Group testing strategies, also called pooling strategies, can be well-suited for efficiently genotyping the rare variants in a genome with few genotyping tests. However, because of the particular nature of genotype data and because of the limitations inherent to the genotype testing techniques, decoding pooled genotypes into unique data resolutions is a challenge. Overall, the decoding problem with pooled genotypes can be described as as an inference problem in Missing Not At Random data with nonmonotone missingness patterns.
In this context, I will present an algorithm based on an Expectation-Maximization scheme, that we have implemented for exploiting the combinatorial information in a pooling design and turn it into probabilistic estimates of the genotypes for every items within a pool. Our estimates are proposed as a way to devise input into classical imputation methods for genotypes, such as tree-based haplotype clustering or coalescent models. The combined results of our scheme and these tools will then render high-quality genotypes for all positions.
Registration
Register to the event here. The registration closes April 24.