Visualisation in R

The Grammar of Graphics

Elen Le Foll

University of Cologne

25 July 2025

Today’s plan ☀️

  1. Why visualise data?
  2. Key principles of effective data visualisation
  3. The Grammar of Graphics (Wilkinson 2005)
  4. Its implementation in ggplot2 (Wickham 2016)
  5. Practice makes perfect. 🤓
  6. Where to next? 🚀

My own #dataviz journey

Winter, Tatjana & Elen Le Foll. 2022. Testing the Pedagogical Norm: Comparing If-conditionals in EFL Textbooks, Learner Writing and English Outside the Classroom. International Journal of Learner Corpus Research 8(1). 31–66. https://doi.org/10.1075/ijlcr.20021.win

Le Foll, Elen. 2021. Register Variation in School EFL Textbooks. Register Studies 3(2). 207–246. https://doi.org/10.1075/rs.20009.lef.

Your prior experiences with R

Forms response chart. Question title: How much experience do you have of using the programming language R?. Number of responses: 17 responses.

Your prior experiences with ggplot2

Forms response chart. Question title: How much experience do you have of using the ggplot2 library?. Number of responses: 17 responses.

Why visualise?

For yourself

  • To explore your data
  • To detect data processing errors and outliers
  • To check assumptions of statistical tests or models
  • To examine variation across different subsets of the data
  • To better interpret the results of statistical analyses

For others

  • To communicate the results of your analyses more effectively
  • To communicate about your data (in more detail)
  • To communicate complex information more efficiently
  • To attract the reader’s attention
  • To allow the reader to reach their own conclusions

Your prior experiences 📊

Forms response chart. Question title: Which tool(s) have you used so far to create plots and figures for your academic work?. Number of responses: 17 responses.

Choosing the “right” plot

  • What kind of data are we dealing with? 🍎🍊

Numeric variables

Categorical variables

Choosing the “right” plot

  • What kind of data are we dealing with? 🍎🍊
  • What is the aim of the visualisation? 🎯
  • What is our research question? 🧐
  • Who is our target audience? 🧑🧑🧒🧒
  • In which format will the visualisation be shared? 👩🏻‍💻

Frequency of the f-word in the BNC1

Reaction times of L1 and L2 speakers1

The Grammar of Graphics

The Grammar of Graphics

  • Syntax
  • Semantics

The hexagonal logo of the ggplot2 package features a graph with dots that are connected by a line.

The ggplot2 package (Wickham 2016)

The Grammar of Graphics

A layered diagram illustrating the key elements of data visualization: Data, Aesthetics, Geometries, Facets, Statistics, Coordinates, and Theme, each shown as a layer with a distinct colour.

The syntax of the Grammar of Graphics (Wilkinson 2005) as visualised in the QCBS R Workshop Series (CC-BY-NC-SA).

Welcome to the tidyverse! 🪐

Hexagonal stickers flying in space. The stickers all represent tidyverse packages including stringr, tidyr, readr, tibble, and dyplr.

Artwork by @allison_horst. CC-BY 4.0.

Tidy data

Stylized text providing an overview of Tidy Data. The top reads “Tidy data is a standard way of mapping the meaning of a dataset to its structure. - Hadley Wickham.” On the left reads “In tidy data: each variable forms a column; each observation forms a row; each cell is a single measurement.” There is an example table on the lower right with columns ‘id’, ‘name’ and ‘color’ with observations for different cats, illustrating tidy data structure.

Tidy data illustration from the Openscapes blog Tidy Data for reproducibility, efficiency, and collaboration by Horst and Lowndes (2020). CC-BY 4.0.

Data

For the rest of this workshop, we will work with open data from:

Dąbrowska, Ewa. 2019. Experience, Aptitude, and Individual Differences in Linguistic Attainment: A Comparison of Native and Nonnative Speakers. Language Learning 69(S1). 72-100. https://doi.org/10.1111/lang.12323.

The two datasets are available from the IRIS database:

Dąbrowska, E. (2018). L1 data [Data set]. Retrieved from https://www.iris-database.org/iris/app/home/detail?id=york:935513

Dąbrowska, E. (2018). L2 data [Data set]. Retrieved from https://www.iris-database.org/iris/app/home/detail?id=york:935514

Literate programming with Quarto

A schematic representing the multi-language input (e.g. Python, R, Observable, Julia) and multi-format output (e.g. PDF, html, Word documents, and more) versatility of Quarto.

Artwork by Allison Horst from the “Hello, Quarto” keynote by Julia Lowndes and Mine Çetinkaya-Rundel, presented at the RStudio Conference 2022.

Let the fun begin!

Artwork by @allison_horst. CC-BY 4.0.

The native R pipe

Collation of two images: One is a famous painting by René Magritte of a pipe with the caption "Ceci n'est pas une pipe" [This is not a pipe in French], and another, in the same style and colours, with the native R pipe operator and its keyboard shortcut with the caption "Ceci n'est pas une pipe" [This is a pipe in French].

Remix of René Magritte’s “La Trahison des images” (1928-1929) with the native R pipe and its RStudio shortcut (based on an image from Wikiart.org). This image is licensed under CC-BY 4.0 Elen Le Foll 2025.

Plotting your way to success 🗺️

Good luck with your projects! 🍀

Artwork by @allison_horst. CC-BY 4.0.

References

Horst, Allison, and Julie Lowndes. 2020. “Openscapes - Tidy Data for Efficiency, Reproducibility, and Collaboration.” https://openscapes.org/blog/2020-10-12-tidy-data/.
Roemling, Dana, Bodo Winter, and Jack Grieve. 2024. “Visualizing Map Data for Linguistics Using Ggplot2: A Tutorial with Examples from Dialectology and Typology.” Journal of Linguistic Geography 12 (2): 69–83. https://doi.org/10.1017/jlg.2024.11.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. New York: Springer. https://ggplot2.tidyverse.org.
Wilkinson, Leland. 2005. The Grammar of Graphics. Second edition. Statistics and Computing. New York, NY: Springer. https://doi.org/10.1007/0-387-28695-0.

How to reuse and cite

Please use the following citation for attribution:

Le Foll, Elen. 2025. Visualisation in R: The Grammar of Graphics. Workshop presented at the First Summer School on Linguistic Creativity, Bielefeld. https://osf.io/xchqd/. (25 July, 2025).