Data - Real-time dose calculation in radiotherapy

DoseRAD2026 Dataset¶

Overview¶

The DoseRAD2026 dataset is a large-scale benchmark designed to advance fast and accurate 3D radiation dose calculation for both photon and proton therapy.

It is the first publicly available dataset to provide:

Paired CT and MRI volumes
Beam-level Monte Carlo (MC) dose distributions
Support for both photon (VMAT) and proton (pencil beam) modalities

The dataset enables direct dose prediction and calculation on both CT and MRI, addressing a key bottleneck in MR-guided and adaptive radiotherapy workflows.

It supports all four challenge tasks:

Photon dose calculation on CT
Photon dose calculation on MRI
Proton dose calculation on CT
Proton dose calculation on MRI

All tasks are based on a unified, spatially aligned MRI–CT dataset, enabling consistent cross-modality benchmarking.

💾 Download 💾¶

The dataset (864 GB) can be accessed via Zenodo/Hugging Face:

https://doi.org/10.5281/zenodo.19347848

A detailed dataset description can be found in the accompanying dataset paper:

https://doi.org/10.48550/arXiv.2604.12778

Cohort¶

The dataset comprises 122 patients with thoracic and abdominal malignancies:

Training set: 75 patients (release April 2026)
- 36 abdominal
- 39 thoracic
Test set: 40 patients (private until March 2030)
External test set: 7 patients (private, no public release)

All cases were carefully screened for high-quality deformable registration, ensuring accurate voxel-wise correspondence between CT and MRI.

Each case includes:

Planning MRI (MR-Linac acquisition, 0.35T bSSFP)
Deformably registered CT
Beam configuration data
Beam-level Monte Carlo dose distributions

Image Pre-processing¶

To ensure high-quality multimodal alignment and simulation readiness, the dataset underwent several processing steps:

Deformable CT→MRI registration
Air cavity correction using MR-based segmentation
Body masking and foreground standardization
Task-specific resampling:
- Photon tasks: 2 × 2 × 2 mm³
- Proton tasks: 1 × 1 × 3 mm³

These steps ensure physically consistent inputs for dose calculation and learning-based methods.

Beam Configurations¶

Each case contains multiple independent beams, and each beam must be evaluated separately.

Photon Beams (VMAT-style)¶

Simplified model of an Elekta Versa HD Linac with an Agility 160-leaf MLC (scriptable VMAT plans based on matRad)
Defined by:
- Multi-leaf collimator (MLC) apertures
- Gantry angles (full arc sampling)
- Isocenter positions
Dataset scale:
- 40,500 photon beam segments (training)

Beam segments are generated from control points and augmented to increase diversity.

Proton Beams (Pencil Beam Scanning)¶

Simplified proton beam model employing an energy-dependent, single-Gaussian approximation for both the beamlet energy spread and the spot size
Defined by:
- Source position and gantry angle
- Energy levels (31.7–200.8 MeV)
- Pencil beam spots arranged in beam’s-eye-view grids
Dataset scale:
- 81,000 proton beamlets (training)

Each beamlet represents a physically simulated proton dose contribution.

Ground Truth Dose¶

For every beam (photon segment or proton beamlet), a corresponding 3D dose distribution is provided.

Ground truth doses are generated using:

Full Monte Carlo particle transport (Geant4)
Tissue-dependent material modeling (via CT-derived density maps)

Key properties:

Beam-specific dose grids
Aligned with CT and MRI volumes
Reported in dose-to-medium (Gy)
Masked to patient body region

These MC simulations serve as the reference standard for evaluation.

Data Organization¶

The dataset is structured in the following way:

Each case includes:

CT volume (.mha)
MRI volume (.mha)
Beam configuration files (JSON)
Corresponding beam-level dose distributions

Standardized naming ensures direct linkage between:

Beam parameters
Dose files

Training and Test Split¶

The dataset is divided into:

Public training set
- Full access to images, beam definitions, and MC dose
Private preliminary test set (release in March 2030)
- To test inference pipeline and submission on grand-challenge
Private final test set (release in March 2030, exluding 7 external patients)
- Held-out for final benchmarking

The test sets remain inaccessible for the next five years to ensure fair and objective evaluation.

License¶

The DoseRAD2026 dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

This allows users to:

Use, share, and adapt the dataset for research and educational purposes
Develop and publish methods based on the dataset

Under the following conditions:

Attribution: Proper credit must be given to the dataset creators
Non-commercial use only: The dataset may not be used for commercial purposes

For full license details, please refer to the official CC BY-NC 4.0 terms.