News

“More transparency about how science works”

29 Jan 2024

Why can some studies not be replicated and what can we learn from this circumstance? An interview with psychologist Mario Gollwitzer and sociologist Andreas Schneck.

When scientific studies are repeated, the outcome should be the same. A number of years ago, it became apparent in the discipline of psychology that, for many prominent experiments, this was not the case. The discipline experienced a “crisis of replication”. Has this since been overcome?

Professor Mario Gollwitzer and Dr. Andreas Schneck address this topic in the interview reproduced below. The two LMU academics conduct research into whether the replicability of scientific studies can be improved ( META-REP). In the interview, they explain why some findings cannot be maintained, which subjects are affected and what this means for researchers and their training.

Let’s begin our conversation by looking back: What triggered the crisis of replication in the discipline of psychology?

Mario Gollwitzer: In 2015, an attempt was made to replicate 100 prominent experiments from the realms of general and social psychology. In the original studies, 97% of all the tests performed confirmed the given hypothesis. But in the repeat studies, this was the case in only 36% of tests. That obviously came as a shock to the whole scientific community.

Publication: Are most published research findings false?

Read more

Mr. Schneck, in a new publication you reviewed no fewer than 35,000 psychological studies that were published between 1975 and 2017. Why was that necessary?

Andreas Schneck: My study looks at how many supposedly significant findings are not in fact significant and how many really do have a kernel of truth to them. We call this the false discovery rate. Incidentally, I did this in the field of psychology not because it is the black sheep of the social sciences, but because it is the only discipline that, since the 1970s, has applied very strict standards to reporting on statistical tests. It is only thanks to this standardization that we can use automated methods to generate a sufficiently large data set in order to investigate this question. In sociology, which is my discipline, reports are produced in ways that are too heterogeneous.

What did you find in your study?

Schneck: I wanted to get a first indication of how problematic the situation really is. If we assume that the researchers did not falsify or manipulate their data or use any other forms of trickery, the false discovery rate, subject to certain assumptions, is roughly 7 percent, which is actually relatively good. However, if we factor in academic misconduct, the figure rises substantially, to 16 percent. That would indeed be an awful lot of false positive results. The same problems undoubtedly also exist in sociology.

Prof. Mario Gollwitzer and Dr. Andreas Schneck

Mario Gollwitzer and Andreas Schneck

research how the replicability of scientific studies can be improved.

© LMU/Stephan Höck

Similar replication projects in various disciplines

Could they also affect other disciplines?

Gollwitzer: Similar replication projects to those conducted for psychology in 2015 have been undertaken in economics, for example. The number of significant effects that were successfully replicated in this case was 61%, which was higher than in psychology but still far from satisfactory.

Neuroscience is affected, too. Here, the problem is that only a small number of test subjects can usually be investigated, if only because of the intricate methods needed. However, small numbers of cases reduce statistical power and the chance of finding an effect, if one exists at all. This insight is now being more clearly understood and is altering practices in this discipline. One project in the META-REP priority program concerns itself with the replicability of electroencephalography (EEG) studies, i.e. experiments in which brain activity is measured. More than 20 teams have joined forces to replicate effects found in earlier studies. Only by pooling resources in this way is it even possible to arrive at acceptable numbers of cases.

The statistical power initially depends on the number of cases

What can lead to false positives being published?

Schneck: How much information is statistically significant has to do with what is known as the statistical power of an experiment, i.e. how many true findings it is possible to discover in the first place. If this number is low, significance does not actually tell us very much. Conversely, and expressed in simple terms, the statistical power initially depends on the number of cases. If there are only a very few cases, it is unlikely that we will be able to identify statistically precise effects.

Things get worse still if certain forms of academic misconduct are added to the equation, i.e. if non-significant findings are manipulated to look like significant findings. This is a practice known as p-hacking, where non-existent effects are dragged through endless statistical methods until someone somehow happens to find apparently significant results. One example is when outliers – extreme data points – are excluded without the reasons for doing so being stated in advance. Especially in sociology, where large data sets based on observation data (derived from surveys, for instance) are often used, there are also ways and means to play around with variables for as long as it takes until, by some coincidence, a significant finding pops out.

Do we have to assume that this is done in bad faith?

Schneck: No, not at all. I believe one reason is that the incentive structures in academia lean heavily toward rewarding discoveries. Non-significant findings are harder to sell.

Gollwitzer: During the review procedures for academic journals, researchers are rewarded if they can show that a hypothesis was correct and that the collected data back it up. On the other hand, on those occasions when findings do not confirm the hypothesis, the reviewers and publishers often ask: What does this tell us, then?

Schneck: It is no different in sociology. There is a certain pressure to produce significant findings, in part so that you don’t find yourself opposing a large body of theoretical literature that sees an effect as valid.

How to increase replicability

What is the best way to deal with academic misconduct?

Gollwitzer: In the META-REP priority program, we try to take a constructive approach to the question of how to increase replicability. One key question is what good meta-research and replication research should look like. How do you design a replication study in such a way that it has even a chance of replicating a positive result? We do not yet really have a good answer to this question. Does it even make sense to take a study from 1977 and repeat it using exactly the same materials in 2023? It is audacious indeed to assume that the results will be the same 50 years on.

That said, there are also movements within the academic community that pursue a more destructive approach. For example, there are those who have committed themselves to the cause of publicly exposing people who can be assumed to have engaged in academic misconduct. I see that as very problematic, both morally and from the perspective of sociological research.

LMU has an Open Science Center. What can open access to research data do to help make scientific findings replicable?

Gollwitzer: In my discipline, a heated debate is currently raging over how to weight the costs and benefits of various measures – whether the disclosure of data can indeed help to substantially improve replication rates because others actually use and review them, for example. The opportunity is there for that to happen, but too few people do it. In psychology, there is an incredible volume of data sets in the public domain. Astonishingly, however, others have seldom made use of them to date.

Why not?

Gollwitzer: Because no one understands them. It would take days for me to take a data set on the Open Science Framework and understand what exactly has been coded, what the meaning of the variables is and whether the data are even reliable. We now have ideas about how to harmonize the coding. But not enough use is yet being made of this openness and transparency. At the same time, it has to be said that the bureaucratic effort involved is very considerable.

Schneck: That is also true in a slightly different form for when processing sociology data. The modeling processes too should be made transparent in the analysis code. Depending on how the code is documented, it takes a huge amount of time to reconstruct how everything works. It also depends on whether data capture was designed to be open from the word go, or whether the data has simply been made available at short notice after the event. If you don’t know that, it takes an awfully long time to work your way into the analytical process and understand it.

Informing about the limits and opportunities of empirical studies

What does the problem of reproducing studies mean for media and society? Could people get the impression that science is no longer trustworthy?

Gollwitzer: One of the issues my chair addresses is the topic of “trust in science”. If you explicitly tell people: “This study arrived at this finding. But be careful, the results are only tentative. We don’t yet know whether we can replicate them,” researchers are always afraid that this additional information will cause people to lose confidence. However, research shows that the scope of trust in science does not decrease as a result. Conversely, what really does erode trust is being told that there is a huge crisis in academia and that only 30 percent of studies can be replicated. That scares people and makes them wary – which is essentially a good thing. If we put these statements together, we realize that greater transparency is needed about how science works, what its limits are and what opportunities it presents.

Schneck: Media works differently to science, obviously. What matters in media is the unheard-of occurrence, the game-changing moment, the highly topical issue. And you are obviously more likely to find that in individual studies than in large-scale meta-analyses. Epistemologically, you often only know after some time what the significance of this or that finding is, whether the study is robust and can be replicated, or whether the result remains a one-off. So, it is important to establish a culture that is open about errors, that uses open materials, collaborative projects and so on.

Is replicability an issue among your students, too?

Schneck: It is. As part of the master’s program at the Department of Sociology, students are expected to replicate a study in a thorough, well-founded manner in their final research internship.

Gollwitzer: We attach great importance to the issue of replication in our study courses. In part, that is because, in Munich, many of the researchers who were here in 2015 have done a tremendous amount of work on exploring how bad the replication problem is and what we can do about it. That is why Munich is a pioneering institution in the field of psychology.

At various points during the bachelor’s degree course, we have added little empirical projects in which small groups of students take their own questions from planning through study to analysis. For years now, we have been careful to ensure that this all happens in line with current standards of openness. Analyses are preregistered, the data are placed in the public domain and there is a code book. We now even have code checking, where students review each other’s analysis codes. So, we are already doing a lot to introduce and live out an error culture as early as the bachelor’s courses, and more and more universities are now following suit.

Prof. Mario Gollwitzer sits at a table and looks into the camera

Mario Gollwitzer

coordinates the project META-REP. | © LMU/Stephan Höck

What would you say to readers of science news?

Gollwitzer: Among consumers of science news we need to raise awareness of how science works and that it rarely delivers absolute proofs.

But I think that responsibility for the critical reception of science-related content lies with them least of all. In my opinion, it is rather the academics who need to think about how they formulate certain statements when talking to the media. As should journalism itself, of course, which condenses and repackages these statements. We need to avoid pushing society in the direction of radical skepticism. That is not a constructive attitude, because skepticism should not lead to a fundamental lack of trust.

Mario Gollwitzer is Professor of Social Psychology at LMU’s Department of Psychology. Born in 1973, he studied and earned his doctorate in psychology at the University of Trier before moving to the University of Koblenz-Landau, where he led the Methods, Diagnostics and Evaluation Center. After eight years at Philipps-Universität Marburg, Gollwitzer came to LMU in 2018.

Andreas Schneck sits at a table and looks into the camera

Andreas Schneck

reviewed 35,000 psychological studies. | © /LMU/Stephan Höck

Dr. Andreas Schneck is a research fellow at the Professor Auspurg Chair of Quantitative Social Research at LMU’s Department of Sociology. He is also Co-Principal Investigator in the META-REP subproject “Enhancing the Robustness of Observational Social Science Research by Computational Multi-Model Analyses”.

After studying sociology at the University of Konstanz, he earned his doctorate at LMU in 2019 with a thesis on academic misconduct.

For more information on this topic, see:

Event:

Panel debate on 6 February 2024: How robust does science have to be before you can communicate it to the public? How does it reconcile the need to exercise appropriate caution but still be sufficiently informative? And who is actually responsible for this issue: academia or the media? On 6 February 2024, the Bavarian Academy of Sciences will host a panel debate on this subject under the aegis of the META-REP priority project. The participants will include science journalists, staff from the META-REP project and members of the board of the German Research Foundation (DFG).

Publication:

Andreas Schneck: Are most published research findings false? Trends in statistical power, publication selection bias, and the false discovery rate in psychology (1975–2017). In: PlosOne 2023

Project:

META-REP (a meta-scientific program to analyze and optimize replicability in the behavioral, social and cognitive sciences) is funded by the German Research Foundation (DFG) as a priority project and coordinated at LMU by Mario Gollwitzer.

What are you looking for?