Digital media: “We often don’t have the right data”

Prof. Frauke Kreuter | © Fotostudio klassisch-modern

Frauke Kreuter is Professor of Statistics and Data Science in the Humanities and Social Sciences at LMU. She is also a member of the task force set up by the National Academy of Sciences Leopoldina, the German Academy of Technological Sciences (acatech) and the Union of German Academies of Sciences, which issued a statement on “Digitalization and Democracy” this month.

The Leopoldina has just issued a set of recommendations on the topic of “Digitalization and Democracy” What prompted this step?

Frauke Kreuter: There are actually several reasons for drawing attention to the issue. Digital media have transformed the landscape in which political views are developed and disseminated. This in turn has given rise to justified fears that digitalization has the potential to become a destabilizing influence on societal cohesion.

This was exemplified by the storming of the US Capitol in Washington DC on the 6th of January. Social media also serve as a vehicle for the promotion of movements that could present a danger to democracies. That’s enough to make one nervous. But it is by no means the case that those who participated on the task force regard social media as an unhealthy development. On the contrary, the advent of digital media has had many positive effects. They give less strident voices and more reasonable views a better chance to be heard, for example.

But there is a risk that they give the loudest voices more prominence than they deserve. So it’s important to take a closer look and consider how digital infrastructure in democracies should be organized in future.

The problem is that the data needed to find out, and understand, what’s really going on in digital media are often inaccessible.

Frauke Kreuter

Access to databases for researchers

How much empirical information is available on the subject?

Frauke Kreuter: A large collection of empirical findings have been published. The problem is that the data needed to find out, and understand, what’s really going on in digital media are often inaccessible.

In the context of social media, many attempts have been made to analyze the impact of trolls or foreign governments on the evolution of political opinions during election campaigns. Empirical studies show that attempts have indeed been made to influence the outcome of elections. But when it comes to defining and quantifying the effects of such efforts, we are left with many unanswered questions.

There have been some very positive initiatives, such as Social Science One in the US, that have tried to share anonymized Facebook data with other researchers. But the interesting data remain in the hands of a few platform operators. One of the recommendations made by our task force is that steps should be taken to make it easier for researchers to gain access to these databases.

Algorithms lack transparency

Algorithms play a central role in digital media. Can you give me some examples of how they work?

Frauke Kreuter: Algorithms are employed for the organization and analysis of the use of different types of content. For instance, they are used to select the content most likely to be of interest to users, based on each individual’s search patterns. In principle, this is one way in which digital media could contribute to the radicalization of users, and potentially lead to a misrepresentation of the diversity of public opinion.

The problem here is the lack of transparency. Algorithms are based on correlations and probabilities. They take account of many variables and learn from the combinations that were effective in the past. This means that it’s essentially impossible to determine why any individual item is displayed. This is a problem that turns up everywhere algorithms are employed. Another example is their use in personalized advertising, also referred to as microtargeting.

Not everyone is able to program an algorithm. But it should be possible to develop a feeling for the roles that algorithms play in the digital world.

Frauke Kreuter

Increasing people’s awareness of how digital media work

That’s not something the average user is likely to consider. Should personalization of data be regulated or disallowed?

Frauke Kreuter: Personalization of content as such is not necessarily problematic. It is quite legitimate for me to be shown things that are of interest to me. It can also be very practical – for example, if the shoes I’m shown are shoes likely to please me. The important question is how much personalization can democracy take?

About two years ago, platforms such as Facebook introduced a feature that gave users the option to learn why the specific commercial items in their own newsfeeds are shown to them. But few users are aware of the fact that algorithms determine what they are shown, or know how they can find out why.

This is why it is so important to increase people’s awareness of how digital media work. Not everyone is capable of building an automobile, but everyone can learn to drive safely. And I would apply this model to this new world, which is powered by algorithms. Not everyone is able to program an algorithm. But it should be possible to develop a feeling for the roles that algorithms play in the digital world, to be more skeptical about their purpose, and to recognize that “this content was generated specifically for me”.

The power of tech plattforms

How does your work investigate questions like these?

Frauke Kreuter: My research focuses on algorithms and fairness. Who gets to see what sorts of content? How and when do algorithms reflect social change? After all, they learn from historical training data and therefore tend to reproduce outmoded attitudes for longer than necessary.

In many areas this is relatively easy to recognize. My canonical example is searching for images via Google. Up until a few years ago, when you entered the search term ‘university professor’, all the professors shown were white males. At some point, Google likely discovered the issue of diversity, and suddenly African-American professors appeared on the screen. In the meantime, Google has tweaked the algorithm in such a way that not only images of the individuals who are most frequently sought are shown. Now the selection of university professors includes an (almost) equal proportion of women.

This underlines the power these platforms have. The earlier algorithm wasn’t misleading, there are still more male than female professors; however, somebody likely made the conscious decision that, instead of relying on the historical training data and reproducing the past, it would be better to alter the algorithm in such a way that the results are now reflecting a more equal distribution, which in many institutions is more aspirational than actual.

In order to promote greater awareness of the interplay of algorithms and societal circumstances, our statement on “Digitalization and Democracy” recommends to integrate at the university level relevant expertise in the humanities and the social and behavioural sciences into the curricula of STEM disciplines and basic technical mathematics and methodological skills in all subjects. In addition, there should be compulsory courses in research and data ethics.

The reorganization of our digital infrastructure is a mammoth task, which can only be undertaken collaboratively.

Frauke Kreuter

Looking for quality of the training data

In your research, you analyze algorithms using statistical methods. What are you looking for?

Frauke Kreuter: Algorithms make predictions, and predictions can be wrong. The degree of uncertainty associated with a prediction rises if the data on which it is based are sparse. If the error probabilities vary between different groups in a sample, the results can be distorted and unfair. Algorithms can also learn from discriminatory practices used against specific groups in the past, and could in turn be used to perpetuate social inequalities.

In the example of the female professors mentioned above, the statistics are known. We know what we can, at best, expect to see. In many other contexts this is not the case. So, we also study issues relating to the quality of the data and the error probabilities. This involves taking a very close look at how the data were generated: Which sets of data were chosen for the training of algorithms? And is the quality of the training data for all social groups equally good?

Forming research partnerships

So, you could assess the effects of the algorithms used on digital media if you had access to the relevant data?

Frauke Kreuter: Yes. Some of the tests that can be applied to algorithms are a black box. But the greater the level of transparency, the easier it is for us to assess whether algorithms are compatible with the principles of fairness, meaning that AI decisions are consistent with fundamental democratic values and fundamental rights, in particular the principles of equal treatment and protection against discrimination.

So, our task force also recommends the formation of research partnerships with platform operators, with a view to gaining access to informative data for research purposes, and studying what is actually happening with digital media.

I am confident that, in this respect, we can look forward to positive changes in the near future. That‘s also the impression I get in my discussions with colleagues who work for Facebook and Google. The dominant platforms are now aware of the risks. The reorganization of our digital infrastructure is a mammoth task, which can only be undertaken collaboratively.

For more information on digitalisation and democracy, see:

Statement: Digitalisation and Democracy

Initiative: Social Science One