SAIN/Utrecht

SAIN Utrecht

AI Safety in the heart of the Netherlands. Building a multidisciplinary community at Utrecht University and beyond, with international backing and a growing research focus.

Programs

What we run in Utrecht

Courses

AI Safety Fundamentals

SAIN Utrecht's AI Safety Fundamentals series has run across three iterations, reaching BSc/MSc students, researchers, engineers, and public-sector participants — more than 100 across three editions. Each week spotlights a different theme so newcomers can drop in and keep coming back for the next session.

Course outline (weekly themes)

  • 1.Introduction — High-level overview of AI safety, why it matters now, capabilities & diffusion, types of risks and solution families (technical & governance).
  • 2.Risks & incidents — Social harms, frontier risks, misuse, loss of control; real-world cases and how research lags deployment.
  • 3.Technical AI safety — Robustness & jailbreaking, scalable oversight, alignment, evaluations, mechanistic interpretability.
  • 4.Regulation & governance — EU AI Act, accountability, lifecycle governance, evidence & audits (e.g. NIST GOVERN–MAP–MEASURE–MANAGE), generative AI / GPAI obligations.
  • 5.Why safety is hard — Industry incentives, funding gaps, expert disagreement, slow regulation vs. fast labs, geopolitical race dynamics.
  • 6.Safety in practice — Red-teaming, checklists, tooling, pathways (fellowships, thesis topics, community) and bridge to hackathons & deeper study.

Course details

Duration
6 weeks
Sessions
~60 minutes each
Modular
Join any week
Format
Interactive: short input and Q&A
Venue
Utrecht University
Audience
Students, researchers, professionals

Topics and timings may shift by cohort — email us for the latest schedule.

Courses

Technical AI Safety

SAIN Utrecht is the first SAIN chapter teaching from ARENA materials — bringing that curriculum to our community so participants get a hands-on, research-aligned technical track alongside our broader programming.

1. Reinforcement learning & LLMs

Proximal Policy Optimisation (PPO)

PPO improves sample efficiency, stability, and robustness for deep RL — balancing exploration and exploitation across environments from robotics to games. You'll implement an agent on CartPole, train toward strong performance quickly, and experiment with reward shaping and extensions (e.g. Atari, MuJoCo).

RL from Human Feedback (RLHF)

RLHF trains models from rewards and human preferences, typically using PPO as the optimisation backbone. The track walks through a full RLHF implementation building on the PPO work — moving from classic RL setups to autoregressive language models with TransformerLens, covering objectives, rollout and learning phases, and an end-to-end RLHFTrainer style workflow.

2. Mechanistic interpretability

Grounding in transformers and mech interp, then branches through probing, circuits, and toy models — mirroring ARENA's syllabus (exercises mix notebooks, TransformerLens, nnsight, SAELens, Neuronpedia, and more).

Syllabus map (high level)

  • Foundations: transformers from scratch, intro to mech interp
  • Probing & representations — linear probes; function vectors & model steering; interpretability with SAEs (scale & prerequisites on superposition); activation oracles
  • Circuits in LLMs — indirect object identification; SAE-based circuits & transcoders
  • Toy models — balanced brackets; grokking & modular arithmetic; OthelloGPT; superposition & SAEs

ARENA's interpretability modules are deep; cohorts pick subsets and pathways rather than every exercise in every branch.

Course details

Duration
Multi-week modules
Sessions
~60 minutes each
Format
Hands-on exercises & guided discussion
Venue
Utrecht University
Prerequisites
Python; linear algebra & probability
Materials
Based on ARENA open curriculum

Technical cohort logistics may differ from fundamentals — email for the current plan.

These courses are independently led by SAIN Utrecht and are not affiliated with UU.

Questions about the course? eduutr@safeainetherlands.org

Discussion groups

Weekly research & discussion groups

Focused groups meeting weekly to discuss and learn about specific AI Safety topics. Connecting students, researchers, and practitioners for deeper engagement.

Technical AI Safety

Building learning pathways in Technical AI Safety topics including Mechanistic Interpretability. Reading and discussing current research with a focus on practical understanding.

AI Governance & Policy

Exploring regulatory frameworks, risk management approaches, and governance structures for AI systems. Connecting academic insights with real-world policy challenges.

Questions about discussion groups? eduutr@safeainetherlands.org

Research

Current research directions

SAIN Utrecht is expanding from education toward a research-enabled hub — including early-stage work on red-teaming LLMs, safety evaluation, interpretability, and agent behavior. Chapter research isn't limited to these themes; connect via the Research Hub for collaboration across SAIN.

Members of SAIN Utrecht contributed to this research.

About

Where AI safety meets diverse expertise

SAIN Utrecht brings AI Safety to the heart of the Netherlands, engaging a multidisciplinary community of students, researchers, and professionals at Utrecht University and beyond.

The chapter is the first AI safety initiative in the Netherlands with both international funding (from BERI) and mentorship integration (from Pathfinder). This unique position enables Utrecht to bridge academic research with practical AI safety work.

Under Riccardo Campanella's leadership, SAIN Utrecht is transitioning from an education-first initiative to a research-enabled hub, with early-stage work on red-teaming LLMs, safety evaluation, interpretability, and agent behavior.

Chapter highlights

  • 3 iterations of the AI Safety Fundamentals program delivered
  • 100+ participants across three editions of the program
  • Funded by BERI with mentorship from Pathfinder
  • Guest speaker events including researchers from Anthropic (60+ attendees)
  • Active local community of 240+ members
  • LinkedIn engagement rate of ~9% (well above 2% benchmark)
  • Growing team with focus on doubling to 15 members across 4 teams

SAIN Utrecht Team

Riccardo Campanella

Director

Betül Selvi

Education Lead

Luca 'Dug' Dughera

Event Lead

Dimitra Tsolka

Public Relations Lead

Cem Kay

Research Operations

Elena Clacova

Social Media Specialist

Maria Mouratidi

Researcher

Max Schaffelder

Advisor

Thijmen van der Meijden

Facilitator