SAIN/Utrecht

SAIN Utrecht

AI Safety in the heart of the Netherlands. Building a multidisciplinary community at Utrecht University and beyond, with international backing and a growing research focus.

Join Community National newsletter (Substack)All Utrecht links (Linktree)

Win4AISafety — Open Research Summer Challenge

Events

Past & upcoming events

Upcoming events — see details on lu.ma/bsv03bzb.

Past events this academic year — recaps on linkedin.com/company/sain-utrecht.

Events: eventsutr@safeainetherlands.org

Programs

What we run in Utrecht

Courses

AI Safety Fundamentals

SAIN Utrecht's AI Safety Fundamentals series has run across three iterations, reaching BSc/MSc students, researchers, engineers, and public-sector participants — more than 100 across three editions. Each week spotlights a different theme so newcomers can drop in and keep coming back for the next session.

Course outline (weekly themes)

1.Introduction — High-level overview of AI safety, why it matters now, capabilities & diffusion, types of risks and solution families (technical & governance).
2.Risks & incidents — Social harms, frontier risks, misuse, loss of control; real-world cases and how research lags deployment.
3.Technical AI safety — Robustness & jailbreaking, scalable oversight, alignment, evaluations, mechanistic interpretability.
4.Regulation & governance — EU AI Act, accountability, lifecycle governance, evidence & audits (e.g. NIST GOVERN–MAP–MEASURE–MANAGE), generative AI / GPAI obligations.
5.Why safety is hard — Industry incentives, funding gaps, expert disagreement, slow regulation vs. fast labs, geopolitical race dynamics.
6.Safety in practice — Red-teaming, checklists, tooling, pathways (fellowships, thesis topics, community) and bridge to hackathons & deeper study.

Course details

Duration: 6 weeks
Sessions: ~60 minutes each
Modular: Join any week
Format: Interactive: short input and Q&A
Venue: Utrecht University
Audience: Students, researchers, professionals

Topics and timings may shift by cohort — email us for the latest schedule.

AI Safety Fundamentals — Program graduation — Cohort highlight
Program graduation

Courses

Technical AI Safety

SAIN Utrecht is the first SAIN chapter teaching from ARENA materials — a four-week technical track from transformer foundations through mechanistic interpretability and alignment, with weekly lectures, notebook exercises, and in-person discussion at Utrecht University.

Course outline (weekly themes)

1.Transformers & mechanistic interpretability — From RNNs to modern transformers: tokenization, embeddings, attention, and how models project back to vocabulary. Introduction to reverse-engineering neural networks via weights, activations, and internal circuits.
2.Probing & representations — Linear probes as detectors in activation space (truth, deception); SAE-based feature decomposition; activation oracles for open-ended questions about model internals. Risk-triage exercises for high-stakes monitoring scenarios.
3.PPO & RLHF — Why naive policy gradients fail and how PPO stabilises training. The full alignment pipeline: supervised fine-tuning, reward-model training, and PPO optimisation — with the KL penalty as a guard against reward hacking.
4.RLHF, GRPO & reward hacking — The transformer as an RL agent (tokens as actions, preference as reward). Value heads and actor–critic setups; GRPO with LoRA fine-tuning. Observing reward hacking in practice — mode collapse, prefix exploitation, and mitigation strategies.

Each week pairs a lecture with ARENA-inspired notebook assignments using tools such as TransformerLens, SAELens, and Neuronpedia. Completing the notebooks and attending in person is required for the certificate of completion; lectures are also streamed online for remote participants.

Course details

Duration: 4 weeks
Sessions: Weekly lectures (~90 min) + notebooks
Format: In-person lectures, streamed online; hands-on exercises & discussion
Venue: Utrecht University
Certificate: Complete notebooks & attend in person
Prerequisites: Python; linear algebra & probability
Materials: Based on ARENA open curriculum

Technical cohort logistics may differ from fundamentals — email for the current plan.

Technical AI Safety — Week 1: Transformers & mechanistic interpretability — Week 1
Transformers & mechanistic interpretability

Technical AI Safety — Week 2: Probing & representations — Week 2
Probing & representations

Technical AI Safety — Week 3: PPO & RLHF — Week 3
PPO & RLHF

Technical AI Safety — Week 4: RLHF, GRPO & reward hacking — Week 4
RLHF, GRPO & reward hacking

These courses are independently led by SAIN Utrecht and are not affiliated with UU.

Questions about the course? eduutr@safeainetherlands.org

Discussion groups

Weekly research & discussion groups

Focused groups meeting weekly to discuss and learn about specific AI Safety topics. Connecting students, researchers, and practitioners for deeper engagement.

Technical AI Safety

Building learning pathways in Technical AI Safety topics including Mechanistic Interpretability. Reading and discussing current research with a focus on practical understanding.

AI Governance & Policy

Exploring regulatory frameworks, risk management approaches, and governance structures for AI systems. Connecting academic insights with real-world policy challenges.

Discussion group — Europe 2031 scenario discussion — In session
Europe 2031 scenario discussion

Questions about discussion groups? eduutr@safeainetherlands.org

Research

Current research directions

SAIN Utrecht is expanding from education toward a research-enabled hub — including early-stage work on red-teaming LLMs, safety evaluation, interpretability, and agent behavior. Chapter research isn't limited to these themes; connect via the Research Hub for collaboration across SAIN.

Upcoming

Research Hub launch

October 2026

Open challenge

Win4AISafety — Open Research Summer Challenge

Call for new research directions for the SAIN Research Hub — a six-week multidisciplinary summer challenge to define scope, conduct research, and publish findings as a Substack post.

Members of SAIN Utrecht contributed to this research.

Publication

Are LLM Belief Updates Consistent with Bayes' Theorem?

arXiv:2507.17951

About

Where AI safety meets diverse expertise

SAIN Utrecht brings AI Safety to the heart of the Netherlands, engaging a multidisciplinary community of students, researchers, and professionals at Utrecht University and beyond.

The chapter is the first AI safety initiative in the Netherlands with both international funding (from BERI) and mentorship integration (from Pathfinder). This unique position enables Utrecht to bridge academic research with practical AI safety work.

Under Riccardo Campanella's leadership, SAIN Utrecht is transitioning from an education-first initiative to a research-enabled hub, with early-stage work on red-teaming LLMs, safety evaluation, interpretability, and agent behavior.

Chapter highlights

3 iterations of the AI Safety Fundamentals program delivered
100+ participants across three editions of the program
Funded by BERI with mentorship from Pathfinder
Guest speaker events including researchers from Anthropic (60+ attendees)
Active local community of 240+ members
LinkedIn engagement rate of ~9% (well above 2% benchmark)
Growing team with focus on doubling to 15 members across 4 teams

Join & contact

Reach out to the chapter, subscribe to the national Substack for updates across SAIN, or explore other ways to get involved.

Email the chapter Join WhatsApp group National newsletter All Utrecht links (Linktree)More ways to get involved

Email the right team to contact SAIN Utrecht

EmailFormal collaborationinfoutr@safeainetherlands.org EmailCommunity Managercmutr@safeainetherlands.org EmailEducationeduutr@safeainetherlands.org EmailResearchresearch@safeainetherlands.org EmailEventseventsutr@safeainetherlands.org EmailSubstacksubstack@safeainetherlands.org EmailPublic Outreachprutr@safeainetherlands.org

SAIN Utrecht

Win4AISafety — Open Research Summer Challenge

Past & upcoming events

What we run in Utrecht

AI Safety Fundamentals

Course outline (weekly themes)

Course details

Technical AI Safety

Course outline (weekly themes)

Course details

Weekly research & discussion groups

Technical AI Safety

AI Governance & Policy

Current research directions

Research Hub launch

Win4AISafety — Open Research Summer Challenge

Are LLM Belief Updates Consistent with Bayes' Theorem?

Where AI safety meets diverse expertise

Chapter highlights

SAIN Utrecht Team

Riccardo Campanella

Luca 'Dug' Dughera

Dimitra Tsolka

Cem Kaya

Elena Clacova

Maria Mouratidi

Max Schaffelder

Leslie Spedner

Thijmen van der Meijden

Join & contact