SAIN Utrecht
AI Safety in the heart of the Netherlands. Building a multidisciplinary community at Utrecht University and beyond, with international backing and a growing research focus.
Events
Past & upcoming events
Upcoming events — see details on lu.ma/bsv03bzb.
Past events this academic year
Events: eventsutr@safeainetherlands.org
Programs
What we run in Utrecht
Courses
AI Safety Fundamentals
SAIN Utrecht's AI Safety Fundamentals series has run across three iterations, reaching BSc/MSc students, researchers, engineers, and public-sector participants — more than 100 across three editions. Each week spotlights a different theme so newcomers can drop in and keep coming back for the next session.
Course outline (weekly themes)
- 1.Introduction — High-level overview of AI safety, why it matters now, capabilities & diffusion, types of risks and solution families (technical & governance).
- 2.Risks & incidents — Social harms, frontier risks, misuse, loss of control; real-world cases and how research lags deployment.
- 3.Technical AI safety — Robustness & jailbreaking, scalable oversight, alignment, evaluations, mechanistic interpretability.
- 4.Regulation & governance — EU AI Act, accountability, lifecycle governance, evidence & audits (e.g. NIST GOVERN–MAP–MEASURE–MANAGE), generative AI / GPAI obligations.
- 5.Why safety is hard — Industry incentives, funding gaps, expert disagreement, slow regulation vs. fast labs, geopolitical race dynamics.
- 6.Safety in practice — Red-teaming, checklists, tooling, pathways (fellowships, thesis topics, community) and bridge to hackathons & deeper study.
Course details
- Duration
- 6 weeks
- Sessions
- ~60 minutes each
- Modular
- Join any week
- Format
- Interactive: short input and Q&A
- Venue
- Utrecht University
- Audience
- Students, researchers, professionals
Topics and timings may shift by cohort — email us for the latest schedule.
Courses
Technical AI Safety
SAIN Utrecht is the first SAIN chapter teaching from ARENA materials — bringing that curriculum to our community so participants get a hands-on, research-aligned technical track alongside our broader programming.
1. Reinforcement learning & LLMs
Proximal Policy Optimisation (PPO)
PPO improves sample efficiency, stability, and robustness for deep RL — balancing exploration and exploitation across environments from robotics to games. You'll implement an agent on CartPole, train toward strong performance quickly, and experiment with reward shaping and extensions (e.g. Atari, MuJoCo).
RL from Human Feedback (RLHF)
RLHF trains models from rewards and human preferences, typically using PPO as the optimisation backbone. The track walks through a full RLHF implementation building on the PPO work — moving from classic RL setups to autoregressive language models with TransformerLens, covering objectives, rollout and learning phases, and an end-to-end RLHFTrainer style workflow.
2. Mechanistic interpretability
Grounding in transformers and mech interp, then branches through probing, circuits, and toy models — mirroring ARENA's syllabus (exercises mix notebooks, TransformerLens, nnsight, SAELens, Neuronpedia, and more).
Syllabus map (high level)
- Foundations: transformers from scratch, intro to mech interp
- Probing & representations — linear probes; function vectors & model steering; interpretability with SAEs (scale & prerequisites on superposition); activation oracles
- Circuits in LLMs — indirect object identification; SAE-based circuits & transcoders
- Toy models — balanced brackets; grokking & modular arithmetic; OthelloGPT; superposition & SAEs
ARENA's interpretability modules are deep; cohorts pick subsets and pathways rather than every exercise in every branch.
Course details
- Duration
- Multi-week modules
- Sessions
- ~60 minutes each
- Format
- Hands-on exercises & guided discussion
- Venue
- Utrecht University
- Prerequisites
- Python; linear algebra & probability
- Materials
- Based on ARENA open curriculum
Technical cohort logistics may differ from fundamentals — email for the current plan.
These courses are independently led by SAIN Utrecht and are not affiliated with UU.
Questions about the course? eduutr@safeainetherlands.org
Discussion groups
Weekly research & discussion groups
Focused groups meeting weekly to discuss and learn about specific AI Safety topics. Connecting students, researchers, and practitioners for deeper engagement.
Technical AI Safety
Building learning pathways in Technical AI Safety topics including Mechanistic Interpretability. Reading and discussing current research with a focus on practical understanding.
AI Governance & Policy
Exploring regulatory frameworks, risk management approaches, and governance structures for AI systems. Connecting academic insights with real-world policy challenges.
Questions about discussion groups? eduutr@safeainetherlands.org
Research
Current research directions
SAIN Utrecht is expanding from education toward a research-enabled hub — including early-stage work on red-teaming LLMs, safety evaluation, interpretability, and agent behavior. Chapter research isn't limited to these themes; connect via the Research Hub for collaboration across SAIN.
Members of SAIN Utrecht contributed to this research.
About
Where AI safety meets diverse expertise
SAIN Utrecht brings AI Safety to the heart of the Netherlands, engaging a multidisciplinary community of students, researchers, and professionals at Utrecht University and beyond.
The chapter is the first AI safety initiative in the Netherlands with both international funding (from BERI) and mentorship integration (from Pathfinder). This unique position enables Utrecht to bridge academic research with practical AI safety work.
Under Riccardo Campanella's leadership, SAIN Utrecht is transitioning from an education-first initiative to a research-enabled hub, with early-stage work on red-teaming LLMs, safety evaluation, interpretability, and agent behavior.
Chapter highlights
- 3 iterations of the AI Safety Fundamentals program delivered
- 100+ participants across three editions of the program
- Funded by BERI with mentorship from Pathfinder
- Guest speaker events including researchers from Anthropic (60+ attendees)
- Active local community of 240+ members
- LinkedIn engagement rate of ~9% (well above 2% benchmark)
- Growing team with focus on doubling to 15 members across 4 teams
SAIN Utrecht Team
Riccardo Campanella
Director
Betül Selvi
Education Lead
Luca 'Dug' Dughera
Event Lead
Dimitra Tsolka
Public Relations Lead
Cem Kay
Research Operations
Elena Clacova
Social Media Specialist
Maria Mouratidi
Researcher
Max Schaffelder
Advisor
Thijmen van der Meijden
Facilitator
Join & contact
Reach out to the chapter, subscribe to the national Substack for updates across SAIN, or explore other ways to get involved.
Email the right team to contact SAIN Utrecht