Positron and version control
Recommended references:
{data.table}, which we will introduce later. It also illustrates that a lot of R code is not written for statistical analysis (the last and final step) but for data management. The article also mentions some statistical techniques which you will meet in later courses (ignore the details for now).It is widely acknowledged that the most fundamental developments in statistics in the past 60 years are driven by information technology (IT). We should not underestimate the importance of pen and paper as a form of IT but it is since people start using computers to do statistical analysis that we really changed the role statistics plays in our research as well as normal life.
Although: “Let’s not kid ourselves: the most widely used piece of software for statistics is Excel.”” /Brian Ripley (2002)
Early statistical computing relied heavily on:
📌 These languages required substantial programming expertise.
FORTAN and C
FORTAN is still used in R for subroutines such as least squares. Even in modern packages!
Same for C, such as for lm, which is popular for high performance computing (fast execution time).
Several dedicated statistical systems emerged:
📌 SAS predates S and influenced later statistical workflows.
SAS data
data_file.sas7bdat) from register holders!Common limitations included:
Rscript script.R~ operator), data frames, and modeling workflowsPositron assistant (mentioned in the video)
This feature will most likely be disabled in any secure working environment. Such environments often have strict rules about data privacy and security, which may conflict with the assistant’s functionality. Health data in SENSITIVE and SECURE environments must not be shared with external services, including AI assistants, to comply with data protection regulations and institutional policies.
It is recommended to not rely on such tools during the course (even if all our data is synthetic). If you start to rely on such tools, you might get difficulties the day you work with real data (might lead to prosecution for “brott mot tystnadsplikten” which is not only public, but actually civil law (“Brottsbalken”) with prison sentence as a possibility). Society put an extreme emphasis on protecting health data, and rightfully so!
No package installer (yet). You need to use pak::pkg_install() or install.packages() etc.
No inline rendering of results in Quarto documents (yet)
You can use multiple active R sessions at once
Great integrated tools from VS Code and extensions, such as GitHub integration
1950s–1970s: Early software development relied on:
analysis_final.fanalysis_final_v2.f📌 No automated tracking of changes.
1970s–1980s: Common practices included:
📌 Version control was social, not technical.
1980s: First-generation tools focused on single files:
Characteristics:
1990s: Project-level systems emerge:
Key features:
📌 Still required constant access to the central server.
Common problems:
These limitations became critical for large projects.
Design principles:
Some interactive learning tools:
You initiate a folder as a git project
Git will track all changes made in that folder
New files
Modified files (especially text, such as programming scripts/functions etc)
Deleted files
Key ideas in Git:
📌 Collaboration becomes more flexible and robust.
Platforms built around Git:
They add:
Issue tracking is not part of Git but is implemented in most (if not all) hosting platforms.
Used for bug reports and discussions between developers and users
Issues can be closed when fixed/adressed but are still found in the history
Each issue gets a number (in order) and those can be referenced in commit messages etc (ex: Fix #37, which will automatically close the issue)
Modern usage includes:
Git is integrated into:
Today, version control supports:
📌 Version control is now a core professional skill.
(After installing the Git software)
cd path/to/your/project
git init
git status
# make changes to files
git add filename1 filename2
git commit -m "Descriptive message about changes"
git remote add origin
The video below is a good start to understand the basic concepts of Git and GitHub (and there are others to be found on YouTube).
Watch this video even though some parts might be overwhelming. It gives a good overview of the current state (2025), even though many things will be too advanced for this course (it is not specifically aimed for statisticians or R users).
.gitignore
The .gitignore file is very important in settings with health data! Pay close attention to this section of the video!
Short official introduction from Microsoft:
More detailed introductions. Watch both! The first one is based on a Windows version of VS code and the second on Mac but the concepts are the same:
Overwhelmed?
This video includes some parts which might be overwhelming if you are new to Git and GitHub. Don’t worry! You don’t need to understand everything right away. Just try to follow along with the basic concepts and steps. You will get more comfortable with practice.
Common file structures
/.../my_project/
├── README.md - project documentation
├── TODO - what should be done next?
├── .git - handled by git (hidden folder)
├── .gitignore - used by git but your responsibility!
├── data/ - your data files (not under version control!)
├── cancer.csv
└── patients.qs
├── R/ - your saved R functions
├── function1.R
└── function2.R
├── reports/
└── _targets.R - targets pipeline script
README.mdMarkdown format (simple text with some possible formatting)
data folderdata/* to your .gitignore fileR folderreports folderIn ECS1 we will use Positron and git/GitHub in action!
Also see the “Reading and practicing” section above for a more in-depth introduction (homework).
Reflect on the use of different software and how the rapid development in this field interplay with other important aspects of our field
Be able to describe the principles of basic git commands (init, add, stage, commit, push, pull) and what they are used for (may be theoretical questions in the written exam)
Use Git and GitHub in practice (but you can choose to do it either by commands or the GUI), this will be assessed in computer exercises and a later project.
Similarily, you need to organize your projects according to best practice (but we will be the focus of EL5).