| Weekday | Time | Shared1 | Compulsory | Activity | Session | title | subtitle | literature |
|---|---|---|---|---|---|---|---|---|
| w4 | ||||||||
| Mon | 10:15-12 | X | Lecture | EL1 | Intro | Overwiev of this part of the course | (Nguyen 2022, ch. 1), (Ludvigsson et al. 2009), (Laugesen et al. 2021) | |
| Wed | 08:15-12 | X | Lecture | EL2 | European legislation | GDPR, EHDS and DA | (Vukovic et al. 2022), (Nguyen 2022, ch. 4) | |
| w5 | ||||||||
| Mon | 10:15-12 | X | Lecture | EL3 | Swedish legislation | Laws and regulation | (“Public Access and Secrecy | Swedish National Data Service” 2025), (Görman 2024) | |
| w9 | ||||||||
| Mon | 10:15-12 | Lecture | EL4 | Tooling | Positron and version control | (“The Unix Shell: Summary and Setup” 2026), (Rodrigues 2023, ch. 4) | ||
| 13:15-16 | Computer training | ECS1 | IDE and version control | Positron, git and GitHub | ||||
| Wed | 10:15-12 | Lecture | EL5 | Reproducibility | {targets} and safe environments | (Nguyen 2022, ch. 2), (Baker 2016), (Oliveira Andrade 2025), (Kavianpour et al. 2022), (Peng and Hicks 2021) | ||
| 13:15-16 | Compulsory | Seminar | ES1 | Ethics and Legality | ||||
| w10 | ||||||||
| Wed | 10:15-12 | Lecture | EL6 | Data formats | data.table, SQL, DuckDB, parquett, SAS, JSON, API | (Nguyen 2022, ch. 2), (Wickham, Çetinkaya-Rundel, and Grolemund, n.d., ch. 21), (Fenk, Furu, and Bakken, n.d.), (Data Analysis Using Data.table, n.d.) | ||
| 13:15-16 | Compulsory | Computer training | ECS2 | Pipeline and reproducibility | targets and understanding the data | |||
| w11 | ||||||||
| Mon | 10:15-12 | Lecture | EL7 | Medical coding | ICD, ATC, KVÅ, regex | (Nguyen 2022, ch. 3), (Wickham, Çetinkaya-Rundel, and Grolemund, n.d., ch. 15), (Bindel and Seifert 2025), (Alharbi, Isouard, and Tolchard 2021), (Nelson et al. 2024) | ||
| 13:15-16 | Compulsory | Computer training | ECS3 | Data formats and regex | ||||
| Wed | 11:15-13 | Lecture | EL8 | Health care registers | From cradle to grave | (Hiyoshi 2026), (Ludvigsson et al. 2016) | ||
| 14:15-17 | Computer training | ECS4 | Data project | |||||
| w12 | ||||||||
| Mon | 10:15-12 | Lecture | EL9 | Documentation | Quarto | (Wickham, Çetinkaya-Rundel, and Grolemund, n.d., ch. 28-29) | ||
| 13:15-16 | Computer training | ECS5 | Presentation | Quarto | ||||
| Wed | 10:15-12 | Lecture | EL10 | Biggish data | ||||
| 13:15-16 | Computer training | ECS6 | Project | Quarto | ||||
| w13 | ||||||||
| Mon | 08:15-12 | X | Lecture | EL11 | Recap | L1, L4-L9 | ||
| 13:15-16 | Compulsory | Seminar | ES2 | Project presentations | ||||
| 1 Shared sessin for the whole course (AGE and EB). | ||||||||
Health data
This website includes documentation for lectures and exercises concerning the health data part of the course STA220 Health data and questionnaires. General information about the course is found on the corresponding Canvas page.
Plan
- See Canvas/TimeEdit for schedule.
- Note that this plan only considers the health data part of the course!
- See details below regarding the literature
- Some literature is only used partially (as described in the handouts)
- The literature is associated with each lecture/session but is not required reading before the session (unless otherwise stated).
- The plan, as well as the content of each session, is preliminary and may change before the respective session is scheduled.
- Modifications to lecture slides (and thus indirectly to the lecture handouts) may be made even after the session has taken place. Such changes are only intended to clarify points discussed during the session or to correct mistakes.
Software
To complete the exercises and assignments for this course you need to install the following software (which are free and available for all major operating systems):
Some additional recommended software (also free but not mandatory for the course):
Positron extensions
Positron is an integrated development environment (IDE) for R. It makes it easy to write and execute R code, manage projects, and visualize data. It is made by Posit (formerly RStudio) PBC and is free for individual use.
Positron is based on Code OS, an open source version of VS Code from Microsoft. As such, it supports a wide range of extensions and customization options.
We will use GitHub Pull Requests. Search for it in the Extensions pane in Positron and install it.
Litterature
Course books are available for GU students through the “O’Reilly Learning for Higher Education”. Instructions for access
Primary course book. Not every chapter of the book will be discussed during the course, and some programming examples in the book are written in languages that we will not use, as our programming focus is mainly on R.
Additional mandatory reading
PDF versions of scientific articles are found in Canvas.
DISA exam
- The literature listed in the table above (literature column) is used for examination.
- Some references, however, are only partially used (see handouts and lecture slides).
- Some of the literature is examined implicitly as part of the project part of the course.
The DISA exam will examine:
The course book Nguyen (2022) (selected parts of chapter 1-4)
All PDF:s found in the Canvas module
- Most tables and detailed methods sections can be skipped
Lecture handouts, excluding:
The written examination will assess your understanding of the main themes covered in the course. The purpose of the exam is not to test memorization of details, but your ability to explain concepts and reflect on methodological issues in health data research.
Structure of the exam
- Section A – Conceptual understanding
-
In this section you will answer a number of short questions about key concepts from the course. Your answers should be brief (usually 1–3 sentences). The goal is to demonstrate that you understand the terminology and the purpose of the concepts discussed in the course.
- Section B – Applied understanding
-
In this section you will be asked to explain or interpret practical situations related to health data analysis. Your answers should be somewhat longer (typically 4–6 sentences) and demonstrate that you understand how the concepts from the course apply in real research settings. The goal here is to show that you can connect course concepts to practical research situations.
- Section C – Reflection and synthesis
-
The final question asks you to reflect on the broader research workflow in modern health data analysis. This part of the exam focuses on your ability to combine ideas from several parts of the course. A good answer should show that you can reason about how these elements interact in real research projects.
Podcast episode
This “podcast episode” has been generated by NotebookML based on the lecture handouts and course literature. It might provide a helpful summary of the course.