The Institute of Phonetics and Speech Processing (IPS) is a leading language research institute with a strong focus on practical application. Its director, Professor Jonathan Harrington, explains why this research is so important.
Professor Harrington, what issues does the subject of phonetics address?
Phonetics tries to explain how phonemes (or speech sounds) are transmitted between a speaker and a listener in spoken language. This is important because, on the one hand, the process is idiosyncratic: The same utterance is never enunciated twice in the same way, due to individual attributes such as anatomy, dialect and a person’s individual linguistic style. On the other hand, enunciation is adapted to the given speaking situation – whether you are talking with friends or delivering a lecture, say.
At the same time, spoken communication is subject to the grammar of the language. Grammar prescribes how words are put together to form sentences, for example, and which speech sounds are used to form words. In German, for instance, both the /k/ and the /n/ are enunciated in words – like Knie – that begin with /kn/. In English, however, the /k/ is not vocalized in the word knee.
What exactly do you do in your research here?
We try to understand how the individual use of language shapes and alters the grammar that is shared by all speakers. It is important to model this interaction between the individual’s way of speaking and abstract grammar in order to reconstruct how the spoken language changes over time, how new dialects emerge and what principles determine the differences between sounds in the languages of the world.
Do you see yourself rather as a ‘treasure hunter’, a ‘preserver’ or perhaps even as a ‘savior’ of language?
We are more like observers. First, because we constantly observe – and sometimes imitate – how phonemes are spoken in everyday situations. And second, because we posit falsifiable hypotheses based on our observations and test them using empirical methods. Seen from this angle, the process is very similar to that in many other areas of science.
A glance at the list of IPS projects reveals a broad range of topics with numerous areas of application. Where does demand exist for your expertise?
The speech databases we collect and manage at the Bavarian Archive for Speech Signals have been and still are used in all kinds of academic and technical projects, one example being the development of virtual language assistants such as Siri and Alexa. In the form of what are known as web services, the software we develop has become the global standard for the automated transcription and annotation of speech data. With these tools, our partners currently process 52 different languages, and more than 22 million speech recordings have been processed in this way since the software was launched in 2013.
Do you also see yourself as a service provider?
I do, yes – especially in research. Many international scientists with little experience of phonetics come to us wanting to learn phonetic skills that they can use for their own research.
Moreover, our online tools are increasingly also being used outside of phonetics – in the field of oral history, in sociology and in teaching German as a foreign language, for example. We are also part of the national research data initiative Text+ and see it as our job to communicate the outcomes of our work to a wider public audience.
Speech observation and research for a variety of applications are the main areas of work of the Institute of Phonetics and Speech Processing.
You interface with many other academic disciplines. Which of them are especially important?
We have important touchpoints with such subjects as psychology in the context of the human processing of speech, and with computer linguistics and informatics in the field of natural language processing. There is also common ground with medical issues such as motor speech dysfunctions and language impairment resulting from genetic factors, anomalies in the maturing of children’s brains or accidents involving relevant injuries.
This is a focal area of our Clinical Neuropsychology Research Group, which draws on phonetic insights and methods to be able to understand, diagnose and, where possible, treat such impairments.
The current status of AI-based speech processing provides scarcely any information about the issues that interest us in basic research.
Professor Jonathan Harrington
What trends and challenges lie ahead in the future?
Big data will play a major role. It already does – as science tackles dialectics and hitherto largely unresearched languages, for example. We have amassed a wealth of expertise that enables us to store and analyze ever larger volumes of data.
We also want to broaden our interdisciplinary research with the Nuclear Magnetic Resonance (NMR) Group at the Max Planck Institute (MPI) for Multidisciplinary Sciences in Göttingen. Here, the aim is to analyze the physiology of speech using magnetic resonance imaging (MRI). Based on the research being done in Göttingen, very-high-resolution MRI images can be taken in real time while someone is speaking. Our cooperation with them makes it possible to record and analyze much larger volumes of MRI data for the physiology of speech than ever before.
Beyond that, the IPS is an international pioneer in researching sound change. Part of this work involves the comparative analysis of speech recordings spanning several decades. The significance of this kind of research will increase as far more data of this type becomes available in the future.
Information technology plays an important part in your discipline. What opportunities does AI give you to identify completely new areas of research?
The huge advances made in AI-based speech technology over the past five years were driven by models trained using vast quantities of oral speech data. On the one hand, this has led to improvements in a series of applied speech processing methods such as the semi-automated recognition of speech sounds.
On the other hand, this development creates new challenges: Despite the impressive progress made, the current status of AI-based speech processing provides scarcely any information about the issues that interest us in basic research. It gives us no deeper insight into the cognitive process of spoken communication, the processes of sound change or the social characteristics of spoken language. On the contrary: Rather than helping us understand our own brains, AI is giving us another problem: that of understanding artificial brains. AI projects can synthesize natural-sounding language almost perfectly, but they tell us very little about human speech. Instead, they actually conceal what they do.
Real-time MRI showing movements in the mouth and throat during speech. The IPS researchers use this technique in cooperation with partners from the Max Planck Institute for Multidisciplinary Sciences in Göttingen.
In your view, what distinguishes human speech from AI speech?
Although AI-generated speech is often of an astonishingly high quality, it is rooted in statistical models that were ultimately derived from massive stores of written language data. One consequence is that AI-generated speech comes across not as authentic, but always as though it were read from script. Another is that it cannot contain any genuinely new and creative linguistic innovations – nothing original. From this perspective, listening to AI-generated speech for a long time is fairly boring, precisely because the technology cannot authentically reproduce certain core aspects of spoken language such as emphasis, intonation and rhythm.
The IPS has a very international staff. What part does that play in your research?
The internationalization of the institute in particular and of the discipline of phonetics in general is nothing really new, although it has definitely become more pronounced over the past two decades. Our research reaps exceptional benefits from this footprint, because the composition of the team also reflects the international importance of the institute. Apart from that, most of our research output appears on international platforms. And in phonetics, we describe and compare a very large number of different languages and dialects in relation to the system of spoken human language. So it is very important to have the input of native speakers of widely differing languages.
You train a lot of students. What will they do when they have finished their studies? What are their prospects?
Graduates who opt for a non-academic career find exciting assignments at companies that develop automated speech processing products. They include tech firms such as Microsoft, Amazon and BMW, but also midcaps with highly specialized portfolios. Our graduates have a solid command of project management and statistical data analytics and are conversant with the use of databases. They can work empirically and have a finely tuned ear, which opens up all kinds of possibilities on the labor market. There are also successful spin-offs from the institute, including Neolexon, which develops speech therapy apps.
The Institute of Phonetics and Speech Processing turns 50 The Institute of Phonetics and Speech Processing (IPS) was set up at LMU in 1972 by Professor Hans Günther Tillmann. Professor Gerd Kegel extended it in the direction of psycholinguistics as early as 1977. Since 1980, analysis of the physiology of speech production has been one focus of its research. The founding of the Bavarian Archive for Speech Signals (BAS) in 1997 gave the IPS a leading position in the development of web-based speech processing services. Professor Jonathan Harrington has been Director of the Institute since 2006 and quickly led it to the very forefront of international phonetic research. On his watch, phonetics, speech technology and psycholinguistics have been successfully merged. Headed by Professor Wolfram Ziegler, the Clinical Neuropsychology Research Group (EKN) was established, adding a neurophonetic focus to the work of the IPS. The acquisition of three advanced grants, two starting grants and one proof-of-concept grant from the European Research Council (ERC) underscores its position as one of the most successful