DoseRAD2026 Dataset¶
Overview¶
The DoseRAD2026 dataset is a large-scale benchmark designed to advance fast and accurate 3D radiation dose calculation for both photon and proton therapy.
It is the first publicly available dataset to provide:
- Paired CT and MRI volumes
- Beam-level Monte Carlo (MC) dose distributions
- Support for both photon (VMAT) and proton (pencil beam) modalities
The dataset enables direct dose prediction and calculation on both CT and MRI, addressing a key bottleneck in MR-guided and adaptive radiotherapy workflows.
It supports all four challenge tasks:
- Photon dose calculation on CT
- Photon dose calculation on MRI
- Proton dose calculation on CT
- Proton dose calculation on MRI
All tasks are based on a unified, spatially aligned MRI–CT dataset, enabling consistent cross-modality benchmarking.
💾 Download 💾¶
The dataset (864 GB) can be accessed via Zenodo/Hugging Face:
https://doi.org/10.5281/zenodo.19347848
A detailed dataset description can be found in the accompanying dataset paper:
https://doi.org/10.48550/arXiv.2604.12778
Cohort¶
The dataset comprises 122 patients with thoracic and abdominal malignancies:
- Training set: 75 patients (release April 2026)
- 36 abdominal
- 39 thoracic
- Test set: 40 patients (private until March 2030)
- External test set: 7 patients (private, no public release)
All cases were carefully screened for high-quality deformable registration, ensuring accurate voxel-wise correspondence between CT and MRI.
Each case includes:
- Planning MRI (MR-Linac acquisition, 0.35T bSSFP)
- Deformably registered CT
- Beam configuration data
- Beam-level Monte Carlo dose distributions
Image Pre-processing¶
To ensure high-quality multimodal alignment and simulation readiness, the dataset underwent several processing steps:
- Deformable CT→MRI registration
- Air cavity correction using MR-based segmentation
- Body masking and foreground standardization
- Task-specific resampling:
- Photon tasks: 2 × 2 × 2 mm³
- Proton tasks: 1 × 1 × 3 mm³
These steps ensure physically consistent inputs for dose calculation and learning-based methods.
Beam Configurations¶
Each case contains multiple independent beams, and each beam must be evaluated separately.
Photon Beams (VMAT-style)¶
- Simplified model of an Elekta Versa HD Linac with an Agility 160-leaf MLC (scriptable VMAT plans based on matRad)
- Defined by:
- Multi-leaf collimator (MLC) apertures
- Gantry angles (full arc sampling)
- Isocenter positions
- Dataset scale:
- 40,500 photon beam segments (training)
Beam segments are generated from control points and augmented to increase diversity.
Proton Beams (Pencil Beam Scanning)¶
- Simplified proton beam model employing an energy-dependent, single-Gaussian approximation for both the beamlet energy spread and the spot size
- Defined by:
- Source position and gantry angle
- Energy levels (31.7–200.8 MeV)
- Pencil beam spots arranged in beam’s-eye-view grids
- Dataset scale:
- 81,000 proton beamlets (training)
Each beamlet represents a physically simulated proton dose contribution.
Ground Truth Dose¶
For every beam (photon segment or proton beamlet), a corresponding 3D dose distribution is provided.
Ground truth doses are generated using:
- Full Monte Carlo particle transport (Geant4)
- Tissue-dependent material modeling (via CT-derived density maps)
Key properties:
- Beam-specific dose grids
- Aligned with CT and MRI volumes
- Reported in dose-to-medium (Gy)
- Masked to patient body region
These MC simulations serve as the reference standard for evaluation.
Data Organization¶
The dataset is structured in the following way:
Each case includes:
- CT volume (.mha)
- MRI volume (.mha)
- Beam configuration files (JSON)
- Corresponding beam-level dose distributions
Standardized naming ensures direct linkage between:
- Beam parameters
- Dose files
Training and Test Split¶
The dataset is divided into:
- Public training set
- Full access to images, beam definitions, and MC dose
- Private preliminary test set (release in March 2030)
- To test inference pipeline and submission on grand-challenge
- Private final test set (release in March 2030, exluding 7 external patients)
- Held-out for final benchmarking
The test sets remain inaccessible for the next five years to ensure fair and objective evaluation.
License¶
The DoseRAD2026 dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
This allows users to:
- Use, share, and adapt the dataset for research and educational purposes
- Develop and publish methods based on the dataset
Under the following conditions:
- Attribution: Proper credit must be given to the dataset creators
- Non-commercial use only: The dataset may not be used for commercial purposes
For full license details, please refer to the official CC BY-NC 4.0 terms.