Let’s talk about language - and its role for replicability (Xenia Schmalz, Anna Yi Leung & Johannes Breuer)

This blog post summarises the ReproducibiliTea session and accompanying paper presented by Xenia Schmalz, Anna Yi Leung, and Johannas Breuer which look at the nature of language and its role in replicability and the doing of science more generally.
Replicability
Language
Methodology
English
Author

Stephanie Yam

Published

July 3, 2025

Introduction

In the ongoing discussion about the “replication crisis” in scientific research, the focus has mostly been on the improvement of statistical methods, data quality and transparency. In their presentation based on their 2025 paper (written with additional co-authors), Xenia Schmalz, Anna Yi Leung, and Johannas Breuer look at the role of language in research, both as an object of study (in some fields) and as the primary medium through which research (more generally) is done and disseminated.

Replicability in science

Broadly speaking, replicability relates to the degree to which research is able to reach the same conclusion(s) when repeating the methodology of an earlier study but using new data (Schmalz et al. (2025); Parsons et al. (2022); Community (2025)). Studies whose findings are replicable thus provide a more robust evidence base than single studies and more strongly indicate that the results are generalisable to various research contexts as well as real-world settings. A general lack of replicability has been identified across academic disciplines which raise questions both about the scientific process as well as the credibility of the results and their subsequent implementation in theory-building and/or evidence-based decision and policy making.

Discussions about the issue of replicability in science have mostly focused on quantitative aspects of research. Thus approaches such as pre-registration (particularly of quantitative studies), data and code sharing, and applying more rigorous statistical methods have been foregrounded.

Replicability or reproducibility?

Note that replicability is typically distinguished from reproducibility which involves repeating the analysis of a study with both the same methods and the same data.

Note

Large sets of studies in, e.g. psychology (Open Science Collaboration (2015); Klein et al. (2022)), medicine (Ioannidis (2005)), economics (Camerer et al. (2016)), and the behavioural and social sciences more generally (Camerer et al. (2018)) have identified a broad lack of replicability across these disciplines (Schmalz et al. (2025, 2)).

Language in research

Schmalz, Leung and Breuer discussed how, as the primary medium in which research is conducted, documented, and disseminated, “the improper or negligent use of language can pose another major challenge for replicability” (Schmalz et al. (2025, 2)).

The nature of language is inherently subjective and its use in science can be vague and imprecise or privilege one way of framing a situation over another. For issues of replicability, for example, the ambiguous formulation of research hypotheses can lead to distinct interpretations and different conceptions of how to test it and evaluate the results (e.g., Scheel (2022)). They can also lead to greater researcher degrees of freedom, including multiple possible analytical strategies and differing results, blurring the line between confirmatory and exploratory research, and inflating false positive rates (e.g., Simmons, Nelson, and Simonsohn (2011)).

Tip

If you’re interested in reading more about the question of whether language can capture objective reality, check out Nick Enfield’s book: Language vs. Reality: Why Language is Good for Lawyers and Not Scientists

Language as research

The use of linguistic data can come with its own challenges for replicability. In the talk and discussion, there was a focus on data access as commonly used data sources for majority, national languages often come from audio-video recordings, news texts, and social media content that are controlled by (commercial) third parties. The media organisations and online platforms which host the data may regularly change or close the interfaces through which researchers can access data. Moreover, these platforms often do not allow researchers to download or share the data that they make use of which impacts the researchers’ ability to practice open science.

The authors/presenters also touched on other challenges for replicability in language/communication science, one of which I found particularly interesting: how the structure and origin of text data have implications for the replicability and reproducibility of language research. For example, transcription, i.e. the converstion of spoken language to written text, of large-resource (often Western European) languages increasingly involves (semi-)automated pre-processing, however such processes rely on theoretical assumptions (see Ochs (1979)) that are built into particular models that may or may not be made explicit and such data- and resource-intensive methods cannot be applied to lesser-resourced languages. Relatedly, the increasing use of Large Language Models for processing and analysing textual data raises concerns regarding the reliability and validity of the annotation as well as the use of proprietary, generative software (see Nils Reiters’ ReproducibiliTea talk about reproducibility when using large language models).

Ways forward

In order to address issues of replicability related to researchers’ use and study of language, Schmalz, Leung, Breuer and their co-authors recommend:

  • Being precise and explicit when using technical terms and formulating research questions and hypotheses
  • The collective refinement of ambiguous definitions in the literature
  • Giving preference to open-source tools and sharing material and data to enable replications
  • Documenting and explaining all steps undertaken in the processing and analysis of (language) data
  • Collaborative crosscultural replication (both in terms of the diversity of the data and the scientists doing the replicating)

A graphic of a large blue speech bubble in the top left and a smaller green speech bubble in the bottom right

Created on canva.com from license-free items. Created on 01/07/2025.

References

Camerer, Colin F., Anna Dreber, Eskil Forsell, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, et al. 2016. “Evaluating Replicability of Laboratory Experiments in Economics.” Science 351 (6280): 1433–36. https://doi.org/10.1126/science.aaf0918.
Camerer, Colin F., Anna Dreber, Felix Holzmeister, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, et al. 2018. “Evaluating the Replicability of Social Science Experiments in Nature and Science Between 2010 and 2015.” Nature Human Behaviour 2 (9): 637–44. https://doi.org/10.1038/s41562-018-0399-z.
Community, The Turing Way. 2025. “The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research.” Zenodo. https://doi.org/10.5281/zenodo.15213042.
Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” PLOS Medicine 2 (8): e124. https://doi.org/10.1371/journal.pmed.0020124.
Klein, Richard A, Corey L. Cook, Charles R. Ebersole, Christine Vitiello, Brian A. Nosek, Joseph Hilgard, Paul Hangsan Ahn, et al. 2022. “Many Labs 4: Failure to Replicate Mortality Salience Effect With and Without Original Author Involvement.” Edited by Yoel Inbar. Collabra: Psychology 8 (1): 35271. https://doi.org/10.1525/collabra.35271.
Ochs, Elinor. 1979. “Transcription as Theory.” In Developmental Pragmatics, edited by Elinor Ochs and Bambi B. Schieffelin, 43–72. New York: Academic Press.
Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716. https://doi.org/10.1126/science.aac4716.
Parsons, Sam, Flávio Azevedo, Mahmoud M. Elsherif, Samuel Guay, Owen N. Shahim, Gisela H. Govaart, Emma Norris, et al. 2022. “A Community-Sourced Glossary of Open Scholarship Terms.” Nature Human Behaviour 6 (3): 312–18. https://doi.org/10.1038/s41562-021-01269-4.
Scheel, Anne M. 2022. “Why Most Psychological Research Findings Are Not Even Wrong.” Infant and Child Development 31 (1): e2295. https://doi.org/10.1002/icd.2295.
Schmalz, Xenia, Johannes Breuer, Mario Haim, Andrea Hildebrandt, Philipp Knöpfle, Anna Yi Leung, and Timo Roettger. 2025. “Let’s Talk about Language—and Its Role for Replicability.” Humanities and Social Sciences Communications 12 (1): 84. https://doi.org/10.1057/s41599-025-04381-2.
Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn. 2011. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22 (11): 1359–66. https://doi.org/10.1177/0956797611417632.