Research data: Storage and access to Big Data

13 Jul 2021

Federal and State governments support the development of a National Infrastructure for Research Data. LMU teams are involved in four of the new projects.

© Jan Greune

Big Data just gets bigger and bigger. With every passing day, the volumes of data generated grow. The world’s researchers alone produce huge amounts of digitalized data. However, these data are stored in wide variety of formats on a plethora of platforms. In order to maximize their usefulness, they must first be gathered together, standardized and appropriately processed, so that they can be readily accessed and productively used – irrespective of their original format or source database – by researchers in all disciplines and locations.

In 2020 Federal and State governments in Germany began to fund a series of large-scale projects with a view to developing a National Infrastructure for Research Data (NFDI). The initiative aims to systematically collect, collate and store, in readily accessible form, all data generated by publicly funded research projects in a wide range of disciplines. Ten new collaborative projects have now been selected for funding. LMU researchers will play a leading role in the four Consortia described below.


In the coming 5 years, this Consortium will create a national platform for the integration and analysis of Big Data accumulated in the fields of Economics and the Social Sciences. This will be a Cloud-based system, which will collate, curate and analyze data and algorithms, and make them accessible for scholars to share. Particular emphasis will be paced on the management of unstructured data, such as those available on social media, in company reports and in the material released by economic and governmental institutions. The researchers will make use of AI-based algorithms to compile and process these types of data, which have become ever more important sources for researchers in Economics and the Social Sciences. The participants in the project are committed to the principles of open access und open science. Consequently, users will be encouraged to provide input and feedback. In return, the scientific community can expect to derive great benefit from the data management tools developed during the project.

LMU‘s spokespersons for the project are Frauke Kreuter, Bernd Bischl and Göran Kauermann, Professors at the Department of Statistics. Responsibility for project coordination rests with Mannheim University. In addition to these institutions, the universities of Hamburg and Cologne, the Institute for Employment Research in Nürnberg, the Leibniz Information Center for Economics in Kiel, the Leibniz Institute for the Social Sciences in Mannheim, and the Mannheim University Library are members of the Consortium.


This project is a response to the digital revolution in photon and neutron research, and its increasing role in disciplines including biology, pharmacology, engineering, physics, chemistry, geology and archaeology. All of these fields are faced with the challenge of meeting the rising demand for rapid analyses of large datasets and faster data transmission rates, which requires that data can be readily located and accessed, and are mutually compatible and re-usable. Photon and neutron research carried out at major facilities alone generates more than 28 petabytes of data per year, and individual experiments can produce as many as a million files. Not only must these huge datasets be rapidly, efficiently and fully analyzed, they must remain accessible for subsequent use. The task of this Consortium is to construct an infrastructure which makes that possible.

The Consortium is made up of researchers based in universities, research institutes and major research facilities, who will be led by committees responsible for the areas of synchrotron and neutron research, respectively. LMU’s spokesperson for the Consortium is Paola Coan, Professor of Medical Physics.


Data derived from mathematical research cover the gamut from databases for special functions and mathematical objects to vital elements of scientific computing, such as models and algorithms. While specialized search engines make it possible to locate mathematical documents and software, they are seldom linked to data. In addition, data volumes are expanding rapidly, and rates of data generation are steadily increasing as the role of Mathematics in data science dynamically expands. In diverse disciplines (such as physics, chemistry, engineering, the biosciences and the humanities), this has led to a rapid increase in mathematical research, and in the use of mathematical models of steadily increasing complexity. In collaboration with the mathematical community, MaRDI will develop the methods and tools necessary to create an infrastructure which makes the systematic collection, organization and utilization of mathematical research data via decentralized and interconnected storage facilities possible. MaRDI is intended to serve as a high-quality depository of mathematical research data, and as a digital service portal for the mathematical community.

The Consortium will be coordinated by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) in Berlin, and encompasses 13 other partner institutions (including universities and research institutes, as well as the German Association of Mathematicians), in addition to LMU. LMU’s spokesperson in the Consortium is Bernd Bischl, Professor of Statistical Learning and Data Science.


The goal of this Consortium is to integrate research data derived from the fields of Particle Physics, Nuclear Physics and Astronomy in a transparent form and make it permanently accessible. This will be done with the aid of novel methods for the management and provision of large-scale databases for further research use. The principal goal of the project is the construction of a platform that enables all kinds of research data generated in digital form to be maintained in accessible formats and be combined in intelligent ways. In addition to the primary areas of physics mentioned above. The work done by PUNCH4NFDI will also benefit researchers in other fields of research, both within and beyond the boundaries of physics, e.g. statistical physics, the physics of soft materials, biological physics, non-linear dynamics, atmospheric physics, oceanography, climate research, geophysics and geodesy, mathematics and informatics.

PUNCH4NFDI will be coordinated by DESY. The task area data transformations is led by Thomas Kuhr, Professor of Physics at LMU. In addition to DESY and LMU, the Consortium has 22 other members including universities and other research institutions. The LMU contingent includes researchers in the fields of high-energy physics, astronomy and cosmology. LMU’s spokesperson in the Consortium is Joseph Mohr, Professor of Physics and Director of the Department of Cosmology and Structure Formation at the LMU Observatory.

What are you looking for?