Carl Nettelblad
Senior Lecturer/Associate Professor at Department of Information Technology; Division of Scientific Computing
- Telephone:
- +46 18 471 22 68
- Mobile phone:
- +46 70 359 12 42
- E-mail:
- carl.nettelblad@it.uu.se
- Visiting address:
- Hus 10, Lägerhyddsvägen 1
- Postal address:
- Box 337
751 05 UPPSALA
Technical Coordinator at Department of Information Technology; Uppsala Multidisciplinary Centre for Advanced Computational Science
- Mobile phone:
- +46 70 359 12 42
- E-mail:
- carl.nettelblad@uppmax.uu.se
- Visiting address:
- Hus 10, Lägerhyddsvägen 1
- Postal address:
- Box 337
751 05 Uppsala
- Academic merits:
- Docent
- ORCID:
- 0000-0003-0458-6902
More information is available to staff who log in.
Short presentation
My research is in the field of scientific computing, but with a firm focus on life science application data analysis, utilizing modern computing architectures (including GPU computations and massive paralellism in varying forms). My basic question is "how can we trade experiment result quality for more sophisticated computational methods", giving better results with worse original data.
Keywords
- artificial intelligence
- genomics
- hidden markov models
- hpc
- machine learning
- optimization
- xfel
Biography
The advances over the last two decades in techniques and methods for massive dat collection have been tremendous in many sectors, including life science. The technological development have allowed ever larger data sets. The sizes being analyzed easily surpass the point where a single scientist can perform any kind of thorough manual quality control. Therefore, the development of analysis methods where errors and inaccuracies can be automatically identified and handled, is crucial.
A central aspect for my research is thus that primary data will always contain errors, missing data, and noise. Based on this insight, one can try to develop metods for handling "bad" samples, or to allow less expensive experimental methodology for equivalent results. This can be contrasted against the more established approaches, which basically imply filtering heavily to identify high-quality sections in datasets, or to use methods that were really designed for high-quality data on all parts of a dataset, in the hope that results will still end up OK.
Currently, my collaboration focus is on applications in the areas of single-particle coherent diffraction imaging using X-ray free electron lasers, and modelling haplotype structure, genotype imputation, and phasing in low-coverage/high error rate genomic data. However, I am alwawys interested in pursuing and discussing other applications where statistical modelling and massive computational efforts are relevant.
Trivia: I have a history way back as a participant/medalist in international science competitions, such as IMO (mathematics), IOI (programming), IBO (biology), ACM ICPC (programming).
Are you a student seeking a thesis subject or a course project, in areas related to data analysis, HPC, or bioinformatics? Are you seeking a PhD or postdoc position? Get in touch. Projects can be tailored to a rather wide set of different backgrounds, while still staying within my research areas. We are currently eager to explore new architectures for our neural networks for genomic data (including transformers and diffusion models).
Publications
Selection of publications
- Achieving improved accuracy for imputation of ancient DNA (2023)
- A deep learning framework for characterization of genotype data (2022)
- A joint use of pooling and imputation for genotyping SNPs (2022)
- Hummingbird (2016)
- Coherent diffraction of single Rice Dwarf virus particles using hard X-rays at the Linac Coherent Light Source (2016)
- Three-dimensional reconstruction of the giant mimivirus particle with an X-ray free-electron laser (2015)
- Imputation of single nucleotide polymorphism genotypes in biparental, backcross, and topcross populations with a hidden Markov model (2015)
- Breakdown of methods for phasing and imputation in the presence of double genotype sharing (2013)
- Fast and accurate detection of multiple quantitative trait loci (2013)
- Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data (2012)
- cnF2freq (2009)
Recent publications
- Genotyping of SNPs in bread wheat at reduced cost from pooled experiments and imputation (2024)
- Observation of a single protein by ultrafast X-ray diffraction (2024)
- Data-Driven Locality-Aware Batch Scheduling (2024)
- Project Hephaistos – II. Dyson sphere candidates from Gaia DR3, 2MASS, and WISE (2024)
- Achieving improved accuracy for imputation of ancient DNA (2023)
All publications
Articles
- Genotyping of SNPs in bread wheat at reduced cost from pooled experiments and imputation (2024)
- Observation of a single protein by ultrafast X-ray diffraction (2024)
- Project Hephaistos – II. Dyson sphere candidates from Gaia DR3, 2MASS, and WISE (2024)
- Achieving improved accuracy for imputation of ancient DNA (2023)
- A deep learning framework for characterization of genotype data (2022)
- An empirical evaluation of genotype imputation of ancient DNA (2022)
- A joint use of pooling and imputation for genotyping SNPs (2022)
- Ptychographic wavefront characterization for single-particle imaging at x-ray lasers (2021)
- Training algorithm matters for the performance of neural network potential (2021)
- Diffraction data from aerosolized Coliphage PR772 virus particles imaged with the Linac Coherent Light Source (2020)
- Flash X-ray diffraction imaging in 3D (2020)
- Electrospray sample injection for single-particle imaging with x-ray lasers (2019)
- The presence and impact of reference bias on population genomic studies of prehistoric human populations (2019)
- BAMSI (2018)
- Femtosecond X-ray Fourier holography imaging of free-flying nanoparticles (2018)
- Assessing uncertainties in x-ray single-particle three-dimensional reconstruction (2018)
- Considerations for three-dimensional image reconstruction from experimental data in coherent diffractive imaging (2018)
- A statistical approach to detect protein complexes at X-ray free electron laser facilities (2018)
- Using convex optimization of autocorrelation with constrained support and windowing for improved phase retrieval accuracy (2018)
- A hybrid method for the imputation of genomic data in livestock populations (2017)
- Experimental strategies for imaging bioparticles with femtosecond hard X-ray pulses (2017)
- Correlations in scattered X-ray laser pulses reveal nanoscale structural features of viruses (2017)
- A flexible computational framework using R and Map-Reduce for permutation tests of massive genetic analysis of complex traits (2017)
- Artifact reduction in the CSPAD detectors used for LCLS experiments (2017)
- Coherent soft X-ray diffraction imaging of Coliphage PR772 at the Linac coherent light source (2017)
- Hummingbird (2016)
- QTL as a service (2016)
- Coherent diffraction of single Rice Dwarf virus particles using hard X-rays at the Linac Coherent Light Source (2016)
- Three-dimensional reconstruction of the giant mimivirus particle with an X-ray free-electron laser (2015)
- Imputation of single nucleotide polymorphism genotypes in biparental, backcross, and topcross populations with a hidden Markov model (2015)
- MAPfastR (2013)
- Breakdown of methods for phasing and imputation in the presence of double genotype sharing (2013)
- Fast and accurate detection of multiple quantitative trait loci (2013)
- Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data (2012)
- Coherent estimates of genetic effects with missing information (2012)
- An improved method for estimating chromosomal line origin in QTL analysis of crosses between outbred lines (2011)
- Using feedback in pooled experiments augmented with imputation for high genotyping accuracy at reduced cost
Books
- Two Optimization Problems in Genetics (2012)
- Using Markov models and a stochastic Lipschitz condition for genetic analyses (2010)
Conferences
- Data-Driven Locality-Aware Batch Scheduling (2024)
- Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment (2020)
- Feature evaluation for handwritten character recognition with regressive and generative Hidden Markov Models (2016)
- Haplotype inference based on hidden Markov models in the QTL–MAS 2010 multigenerational dataset (2011)
- A Grid-Enabled Problem Solving Environment for QTL Analysis in R (2010)
- cnF2freq (2009)
Reports
- Consistency Study of a Reconstructed Genotype Probability Distribution via Clustered Bootstrapping in NORB Pooling Blocks (2022)
- Breakdown of methods for phasing and imputation in the presence of double genotype sharing (2012)
- Inferring haplotypes and parental genotypes in larger full sib-ships and other pedigrees with missing or erroneous genotype data (2012)
- Assessing orthogonality and statistical properties of linear regression methods for interval mapping with partial information (2010)
- Stochastically Guaranteed Global Optimums Achievable with a Divide-and-Conquer Approach to Multidimensional QTL Searches (2010)