Reliable AI: “If AI is to make decisions, it needs cont …

Professor Christoph Kern | © LC Production

Medicine, transport, human resources – artificial intelligence (AI) is already supporting decision-making in many safety-relevant fields. Christoph Kern, Junior Professor of Social Data Science and Statistical Learning at LMU’s Department of Statistics, and colleagues have critically evaluated this development in the journal Nature Computational Science.

Algorithms are already helping humans make decisions about therapies, jobs, and traffic flows. What risks do you see here as a statistician and social scientist?

Christoph Kern: AI and machine learning are developing rapidly – and are often being used for sensitive, socially relevant decisions. Our concern is that technical developments and conceptual understanding are diverging. Many of these high-performance systems are created without causal reasoning or clarity about the relevant influencing factors and objectives. And so we – researchers from LMU’s Department of Statistics and Munich School of Management together with colleagues from the University of Cambridge in England and the University of Maryland and Carnegie Mellon University in the United States – make the case that algorithmic decision-making should be employed with due consideration of causal concepts and their assumptions.

Can you give an example?

I’m researching unemployment offices, for instance. How can they make better decisions regarding which jobseekers should receive more support in looking for a job? An obvious approach would be to train a risk model that predicts whether somebody is at risk of long-term unemployment – and then offering that person job assistance. As sensible as this may sound, it is actually highly sensitive from a causal perspective. Incorrect modeling can quickly lead to unfair decisions. Moreover, the data situation is often highly complex – with information about resumés, job experience, and various job assistance measures. If important factors and relationships are overlooked, it can fatally corrupt the process: Somebody could receive assistance who needs it less, while another person who urgently needs help goes away empty-handed.

Affecting hundreds of thousands of jobseekers

Which factors could that be?

We’re talking about so-called confounders: These are factors that influence both the decision and the outcome. In the jobs market, a person’s educational background and employment history can be decisive, but also structural aspects such as regional differences or institutional requirements and processes. In the medical sphere, factors like age, sex, underlying health conditions, lifestyle, and access to healthcare can influence both treatments and outcomes. What’s important to realize here is that if you overlook such mechanisms, then you risk drawing incorrect inferences. And so these are precisely the things that must be clarified at the outset.

What kind of support do job seekers need? Causal AI can help tailor counseling and training services more effectively. | © IMAGO / Rene Traut

Before feeding an AI model with data, then, you need a clearly thought-out plan?

At the very beginning, you must ask yourself the question: What do I actually want to know? How high is the risk that somebody will remain unemployed for a longer period if they do not receive assistance? Or: What does a measure achieve – compared to not intervening? Such questions concern cause-effect relationships – pure correlations are insufficient. Statistical concepts like causal graphs and the potential-outcomes framework help systematically model cause-and-effect relationships. They furnish a conceptual and mathematical framework that clarifies under what conditions data-based decisions make sense in the first place.

If unemployment offices are to decide, for example, who should receive a particular kind of job assistance, it is not enough to simply predict the highest unemployment risk. After all, somebody might have already received lots of assistance in the past, something that cannot easily be ascertained in many data sources. Only when all relevant factors and relationships are clear and the model has been set up cleanly can machine learning help calculate the effects efficiently and on a large scale – say, for hundreds of thousands of jobseekers.

Best fit instead of blind distribution

In what other sensitive areas is this important?

In the medical sphere, for example, our colleague Mihaela van der Schaar from the University of Cambridge analyzes what risks patients have with different treatment strategies. She investigates things like: What would have happened if a certain therapy had not taken place? Such counterfactual questions can be modeled only with causal reasoning. In the domain of smart cities, meanwhile, there are examples such as adaptive traffic light systems, which are meant to improve traffic flow by means of adapted controls, or dynamic pricing for parking, where fees change depending on the busyness of the car park and the time of day – in order to steer the behavior of road users and reduce emissions.

Another example from our research concerns the distribution of refugees. In Germany, refugees are usually prorated across the federal states according to the so-called Königstein key. But here, too, we could ask: In which federal state would a certain person have the best prospects of integrating? Answering such questions requires careful modeling.

If a system can say: “I’m not sure here,” this will prompt people to take a closer look. Uncertainty should not be seen as a weakness, but as necessary information

Christoph Kern

You’re also calling for more interdisciplinary thinking in the development of such AI systems. Which disciplines do you have in mind?

First and foremost, a triad: Statistics furnish the methodological basis for causal inferences. Computer science provides the technical development, implementation and scaling. And then we need the knowledge from the respective specialist domain – from doctors, traffic planners, social workers, or whoever it might be. These professionals understand the practical realities and know which assumptions are realistic. But the social sciences also play an important role – especially when it comes to the social use of algorithmic systems: Who makes decisions? How transparent are they? What role do humans play in conjunction with algorithms? And last but not least, survey research can contribute a lot of expertise – particularly vis-à-vis questions of data quality.

Uncertainty as necessary information

News

In search of the unfairness gremlin

What are the characteristics of good training data for causal models?

Firstly, they must contain the relevant influencing factors – that is to say, not just the characteristics of the person in question, but also contextual information such as earlier measures, institutional rules, and specific regional factors. Secondly, they must cover a certain social breadth. If certain population groups – such as women, older people, people with disabilities, or people with migration backgrounds – are absent or strongly underrepresented in the data, even the best models cannot deliver reliable outputs for these groups. This risks embedding unfairness in the system.

According to your comment piece, AI models shouldn’t only suggest decisions, but also disclose their uncertainty. What do you mean?

In practice, the AI model often just presents a prediction along the lines of: “This person falls into the highest risk category of becoming long-term unemployed.” What’s missing here is how certain or uncertain this prediction is. In statistics, it’s common to include confidence intervals, whereas this has rarely been the case for algorithmic systems to date. But precisely in sensitive contexts, it can be crucially important to know how much confidence I can place in a model. If a system can say: “I’m not sure here,” this will prompt people to take a closer look. Uncertainty should not be seen as a weakness, but as necessary information.

How do you hope AI decisions will be handled in sensitive areas?

I hope that algorithmic decision-making systems will always be considered in relation to their social impacts. It’s not enough for a model to have high predictive quality. What’s decisive is whether it permits causal inferences that are reliable in real-world settings. This requires more than statistics and models, but also a clear understanding of the question, the availability of good data, transparency about assumptions – and open acknowledgement of uncertainty. We’re calling on research and practice to bring together these elements before AI models are used in high-risk areas.

Christoph Kern is Junior Professor of Social Data Science and Statistical Learning at LMU’s Department of Statistics. In his research, he addresses the question of how algorithmic decision-making systems can be used reliably and responsibly – especially in sensitive areas such as the jobs market or public administration. One of his primary focuses is on the limits of data-based predictions. Kern is also a project leader at the Mannheim Centre for European Social Research and previously worked in the Joint Program in Survey Methodology at the University of Maryland among other roles.

Publication

C. Kern et al.: Algorithms for reliable decision-making need causal reasoning. Nature Computational Science 2025

Reliable AI: “If AI is to make decisions, it needs context and causality”

Affecting hundreds of thousands of jobseekers

Best fit instead of blind distribution

Uncertainty as necessary information

Publication

What are you looking for?