Giving structure to the data

5 Jun 2024

Biostatistician Michael Schomaker – a new face at LMU – conducts research into statistical methods for causal issues.

Should children infected with HIV be treated immediately, or is it better to wait a while? “In the past, possible consequences arising from the irregular administration of medication – such as the development of resistance that leaves fewer therapeutic options in the future – were one argument against early treatment,” explains statistician Michael Schomaker, who first tackled this issue at the University of Cape Town.

“Conventional statistical comparisons make one fundamental mistake: If I compare those who have been treated with those who haven’t, it appears as if the latter are doing better. But that is because, in the past, treatment used to commence only when patients’ health was in a worse condition. Any fair comparison of treatment options must make appropriate provision for this circumstance.” This discovery prompted Schomaker to address the topic of causality: the ability to derive cause-and-effect relationships from observed data. “The principle is a complex one,” he says. “Alongside the data itself, you have to answer structural questions: Where does the data come from? What mechanisms are involved?”

Professor Michael Schomaker

© LMU/LC Productions

In May of last year, Schomaker took up his current position as Professor of Biostatistics at LMU’s Faculty of Mathematics, Informatics and Statistics. “Classical statistics concerns itself mostly with associations, correlations and clear relationships. However, causality requires a counterfactual mindset, addressing relationships that are not merely observable but genuinely causal.” Using biostatistical methods, the professor explores not only matters relating to medication, but also the way it is administered and, indeed, the nature of entire healthcare programs.

Epidemics such as tuberculosis and HIV

Schomaker earned his degree and doctorate at LMU’s Department of Statistics, submitting both theses on the subject of “missing data”. After brief research stays in places such as Hong Kong, he took up a post as Senior Lecturer in South Africa in 2011. “At the University of Cape Town, I first approached the discipline of biostatistics, which applies statistical methodologies to the life sciences.” His interdisciplinary research in the field of epidemiology involved statisticians, clinicians and public health experts, for example. He specifically focused on infectious diseases such as tuberculosis, but especially HIV. “South Africa’s national HIV program had only just been established at the time. Our job was to build and analyze cohorts of patients suffering from this disease.”

After eight years in Africa, Schomaker became Associate Professor at the University of Innsbruck, where he spent two more years working on statistical methods for applications in epidemiology. The German Research Foundation’s Heisenberg Program, to which he applied for a project on causality in complex longitudinal data with applications in pharmaco-epidemiology, ultimately supported his return to LMU as a professor. Once back in Munich, he continued to study causality, albeit with “difficult statistical data, i.e. data with gaps and measuring errors”.

Schomaker continues to collaborate with the humanitarian organization Médecins sans Frontières and with researchers in Cape Town. “Since the Covid pandemic, biostatistics has attracted greater attention, especially in the fields of epidemiology and public health – in relation to co-infection with other lung diseases such as tuberculosis, for example.”

Beyond that, LMU is a hotbed of new opportunities for cooperation across the boundaries of individual disciplines. At the Center for Advanced Studies, he engages in interdisciplinary research into the processing of medical data. In another project, he works together with intensive care specialists. Meanwhile, he is also researching issues of fairness in conjunction with colleagues at the Munich Center for Machine Learning (MCML).

Fairness in the granting of credit

“Fairness comes into play with regard to insurance premiums and bank loans, for example,” Schomaker notes. “Machine learning (ML) is typically used to make predictions – such as the probability that someone will repay a loan. If the law prescribes the protection of certain attributes such as gender, age or origin, this adds another ‘twist’ to the actual data and leaves us working with hypothetical "what if” data: “Would the person have been able to repay the loan if they had a different gender (which would in turn have consequences for the person’s job situation, income etc.)?” The reason given could be historical discrimination, for example, which – if the data was not corrected using the ML model – would simply continue on into the future. “For that, however, I need not just statistics but also structural assumptions,” the professor explains.

Schomaker’s teaching, too, repeatedly revolves around causality. “In one lecture, for example, we were talking about a patient’s lawsuit against a pharmaceutical company,” he recalls. “She believed that she had suffered serious injury because of a drug. But was the drug really the cause?” A logical line of argument in court requires knowledge that goes above and beyond data alone. “And before even looking at the data, you have to ask yourself: Is it at all possible to find a watertight answer to my research question?” In some cases, the statistician insists, the mechanisms involved are simply too complex.

What are you looking for?