Between art and ai: Björn Ommer explores deep learning …

Teaching machines to see is one of Professor Björn Ommer’s goals. That said, seeing is only a step of learning along the road to a different and greater challenge: that of autonomous understanding. “I have a keen interest,” the informatics expert says, “in discovering how we as humans make sense of what we see.” He also wants machines to learn the same thing.

Since fall 2021, Ommer has held the newly established Chair of AI for Computer Vision and Digital Humanities/the Arts at LMU. His position is attached to both the Faculty of History and the Arts and the Faculty of Mathematics, Informatics and Statistics. His working group conducts basic research into computer vision and machine learning, focusing in particular on how they can be applied within the digital humanities.

“Deep learning has come on in leaps and bounds in recent years,” Ommer explains. “We suddenly find cars that really do drive autonomously. Artificial intelligence (AI) is assisting with medical diagnostics … Many of the things we have been researching for years are now springing up as prototypes and are there for the public to see.” Yet there are plenty of new questions that keep him busy in his research: “Of special relevance for the humanities is the subject of ‘retrieval’: learning how things can be found in large image databases, like the proverbial needle in a haystack,” Ommer says. “In one project we are assembling half a millennium’s worth of art – using the same algorithms with which we also investigate Banksy’s street art or any Google images.” Alongside Art History, the professor is also working together with the Institute of Assyriology. “Using AI, we are deciphering cuneiform script on clay tablets. It is not fully automated, but is helping us.”

Deep learning has come on in leaps and bounds in recent years.

Professe Björn Ommer

Moving the sunrise

Ommer has no intention of automating other disciplines. “I see the computer more as a tool for us as humans. And I want it to be a lot better than it is at the moment.” The problem with pictures in particular is the ‘semantic gap’: “If I want to process images in Photoshop, I still have to work with individual pixels. Yes, I can group them together into regions. But the computer doesn’t understand if I say: “Take a sunrise and move it in the sky.” What Ommer is aiming for is a “more natural way of dealing with the machine”, a “content-based comprehension” of pictures, for example, so that “the computer understands people better.” In this context, he says, the relatively young discipline of informatics can “learn a lot from centuries-old art history”.

“My background is in informatics,” Ommer says, having studied it with physics as a minor subject at the University of Bonn. At ETH Zurich, he then earned his doctorate in informatics with a thesis on “Learning the Compositional Nature of Objects for Visual Recognition” in 2007. As a postdoctoral researcher, he worked in the Computer Vision group at the University of California, Berkeley, before accepting a professorship at the Faculty of Mathematics and Informatics at the University of Heidelberg in 2009. Until his subsequent move to LMU, he also served as Co-Director of the Interdisciplinary Center for Scientific Computing, as well as being affiliated to the Faculty of Philosophy and the Faculty of Physics.

Another major aim of his research involves self-monitored and more efficient learning. “As things stand, we still shovel huge volumes of data into the machine learning process and use lots of annotations to explain them to the computer.” Ommer likens this to saying “airplane” to a small child every time one appears in the sky. “But we want the computer to be able to derive meaning from the data autonomously, with minimal monitoring. We want it to learn what airplane means on its own.”

A simple explanation of deep learning

Visual synthesis is another field of research for Ommer. Here, the same process is reversed: “I no longer tell the computer to find a person in the image database. Instead, I say: ‘Show me a person who is this many years old, has this gender, is this tall …’ This significantly more difficult formulation helps us, too, to see what the machine has understood and what it hasn’t.” For users, this means working with images no longer in the pixel space but “in the modeled space”. Ommer again: “I enter a picture of someone, encode it and add modifications such as ‘change the gender’. This can go as far as only entering text, on the basis of which the machine synthesizes pictures, the researcher says. Even laypeople can already use simple text prompts to generate images: “‘A bird the way Picasso would have painted it, in front of a sunrise’, for example.”

On the downside, Ommer notes that the AI algorithms that are needed to allow this kind of research and its applications are growing ever more complex. “Even experts often no longer know how the computer actually arrives at its decision,” he admits. “As the machine’s performance increases, our understanding of it decreases.” In medical diagnostics or accounting, this can be critical: “In cooperation with neuroscientists from ETH Zurich, for example, we used AI to analyze the movements of patients with neurodegenerative conditions.” From their patterns of movement, conclusions were drawn about what might have happened in the cortex during a stroke, or whether a treatment was working. KI served as a non-invasive diagnostic tool. “But we owe the patient an explanation,” Ommer states. “Are they sick or not? What is their life expectancy? Or in the legal realm: Guilty or not guilty? All these piles of numbers churned out by AI, the ever more complicated deep learning models must be explainable if people are to trust them.”

Following up on this further research objective of “interpretable AI”, the aim is to create models that can analyze another model post-hoc – after it has already been fully trained – and then explain its decision-making. There are cooperative ventures with the German automotive industry, for example. “We want to make autonomous driving not just go faster, better, farther, but also safer, by making it transparent.”

Agile gray matter

What motivates Ommer to collaborate with such widely varying disciplines as the humanities, neurosciences and the automotive industry is the versatility of the human brain. “We use the same brain in a Zoom meeting, when driving and when working as a doctor, say. And for computers, I have in mind algorithms that will let them scale up or down to deal with the same kind of widely differing issues.” Why? To bring the machine a little closer to humans on the semantic level – and, by no means least, to make it “a little less frustrating” to work with the things.

Between art and ai: Björn Ommer explores deep learning and computer vision

Moving the sunrise

A simple explanation of deep learning

Agile gray matter

What are you looking for?