Stina Zetterström: Bounds for selection bias in causal inference
On the 6th of December at 09:15 the thesis defence of Stina Zetterström will take place, with the thesis ’Bounds for selection bias in causal inference’.
Many research questions have a causal nature, where the causal effect of a treatment variable (e.g. vaccine) on an outcome variable is of interest (e.g. mortality). When randomized experiments are not possible to perform, applied researchers must resort to using observational studies. However, observational studies can suffer from bias, for instance due to confounding where the researcher fails to incorporate common causes of the treatment and outcome variables into the analysis.
However, in the present thesis another type of bias is of interest: selection bias. In this context, selection bias is referred to as bias that may arise due to some sort of selection of subjects in the study. Such selections could occur due to, for instance, missing data or the construction of the study population, depending on the population of interest in the study. For instance, a typical example would be Berkson’s bias, where the relationship between two diseases is studied in only hospitalized patients. In this case, the inclusion of only hospitalized patients complicates the assessment of the causal effect and can distort the results.
One way to handle bias is to study its potential impact on the causal analysis and carry out a sensitivity analysis to investigate how robust the results are. There are several different methods of conducting sensitivity analyses, and one approach is to calculate bounds for the magnitude of the bias. This will answer the question that “if a study has selection bias, the bias will not be larger than the bound”. These bounds can be based on data, unknown sensitivity parameters where the researcher input plausible values based on experience, or a combination of both.
In this thesis, selection bias is well-defined in terms of causal estimands in both a total and subpopulation, several bounds for selection bias are investigated. Furthermore, software for calculating the bounds is developed and presented. The bounds in this thesis can be used in studies that estimate a causal effect on either the difference or ratio scale and are suspected to suffer from selection bias. The different bounds presented can then be used in different situations depending on the extra knowledge available.
For instance, the causal risk ratio of cancer screening on prevention of cancer is often of interest. In such studies, the study population typically only includes patients without a previous cancer diagnosis. However, one way to detect cancer is through screening. The exclusion of patients with a previous diagnosis can then lead to an overestimation of the preventive effect, as the composition of patients in the screening group will differ from the non-screening group. The maximum magnitude of this potential selection bias can be assessed with the different bounds presented in this thesis, depending on the data available and assumption that can be made.
