A Survey on Predictive Safety
in Embodied AI
- 1University College London
- 2Holistic AI
Abstract
Embodied AI systems operate in the physical world, where failures cause irreversible harm, yet safety research remains siloed across robotics, autonomous driving, and foundation-model communities. We survey 236 papers (2017–2026) through a three-axis taxonomy of safety aspects, embodied system types, and model types — including the emerging class of world action models (WAM) that unify vision-language-action and world-model architectures. We organise the field under the lens of predictive safety: the principle that an agent should evaluate candidate actions by simulating their consequences before execution. Combining topic modelling, co-occurrence analysis, and a gap-score metric that quantifies under-explored areas relative to expected coverage, we map where effort concentrates and where blind spots persist. Manipulation under robustness, alignment, and constraint satisfaction is well studied, whereas robustness in simulation environments and multi-agent systems, control barrier methods for autonomous driving, and the transfer of safe-RL formulations from RL to vision-language-action models remain substantially neglected. Risk-weighted scoring elevates autonomous driving to the top of the priority list. We distil the results into a quantitative roadmap of research priorities to close the most consequential safety gaps in embodied AI.
The corpus, paper by paper
Each point is one paper. Colour encodes the dominant architecture, method, or paper type; columns stack by year. Hover a point for details, click any legend chip to filter.
Categories are assigned by keyword matching against title and abstract, mirroring the paper's analysis pipeline. When a paper matches several, the dominant tag is selected by priority (architectures > methods > paper type).
1. Overview
The transition from digital to physical agency fundamentally changes the safety landscape. When a language model hallucinates, the consequence is correctable misinformation; when a robotic manipulator miscalculates a grasp force, the consequence may be a shattered object or an injured person. Physical actions are largely irreversible, unfold in open-ended environments that resist exhaustive specification, and affect individuals who never consented to interact with the system.
A unifying thread across the most promising approaches is predictive safety: an agent should evaluate the safety of candidate actions by simulating their consequences before committing to execution, rather than reacting to failures after they occur. Predictive safety draws on world models to generate imagined trajectories, on control barrier functions to certify they remain in safe regions, and on risk assessment to rank actions by their expected harm under uncertainty.
A model capable of imagining the future is also capable of screening unsafe futures before they are enacted.
2. Three-axis taxonomy
We cross-reference safety aspects, embodied system types, and model paradigms. The first two define the 10 × 7 cell matrix that drives the co-occurrence and gap-score analyses; the third is tagged per-paper to capture the underlying computational paradigm.
- Safe reinforcement learning
- Robustness
- Alignment
- Constraint satisfaction
- Human–robot safety
- Collision avoidance
- Formal verification
- Control barrier functions
- Risk assessment
- Sim-to-real safety
- Manipulation
- Navigation
- Autonomous driving
- Drones / UAV
- Humanoid & legged
- Multi-agent systems
- Simulation environments
- Reinforcement learning
- Embodied reasoning
- Vision-language-action (VLA)
- World models (WM)
- World action models (WAM)★
★ WAMs unify VLA and world-model architectures into a single network that jointly predicts future states and the actions that produce them — the locus where predictive safety can be enforced natively.
3. Where the field looks
Research effort is non-randomly concentrated across the 10 × 7 matrix (χ² = 68.9; permutation p = 0.021). Manipulation under robustness, constraint satisfaction, and alignment is densely studied; whole rows and columns elsewhere remain near-empty.
4. Risk-weighted research gaps
The gap-score metric G measures a cell's under-coverage relative to its expected count under the independence null. Weighting by real-world consequence (autonomous driving R = 0.95; manipulation R = 0.78; simulation R = 0.30) yields the risk-weighted gap RG, which sharpens priority ordering toward high-consequence platforms. Autonomous driving sweeps four of the top five.
| # | Aspect × System | RG | Why it persists |
|---|---|---|---|
| 1 | Control barrier × Autonomous driving | 0.59 | CBF synthesis assumes control-affine dynamics; multi-agent traffic violates this. |
| 2 | Alignment × Autonomous driving | 0.53 | Alignment research developed around text LLMs has limited transfer to spatial reasoning. |
| 3 | Human–robot safety × Autonomous driving | 0.53 | HRI safety frameworks were built for deterministic robots, not stochastic learned controllers. |
| 4 | Constraint satisfaction × Autonomous driving | 0.47 | Safe-RL constraint formulations rarely transfer to large-scale, mixed-traffic deployment. |
| 5 | Formal verification × Manipulation | 0.41 | Industrial robots scale into less-controlled settings without verification frameworks for learned controllers. |
Sensitivity. Gap rankings are stable across denominator variants (ρ > 0.99), bootstrap resamples, and keyword-dropout tests (τ = 0.71, Jaccard = 0.84). A 1,000-trial perturbation of the risk weights (each scaled by a uniform ±20% multiplier) preserves at least four of the baseline top-five priorities in 99.8% of trials.
5. Future directions
Three frontiers stand out for advancing the predictive-safety paradigm:
- Causal world models. Current world models learn correlational dynamics that may not faithfully capture intervention outcomes. Causal structure is essential for counterfactual safety analysis — what would have happened if the robot had not braked?
- Predictive safety as a training objective. Rather than treating safety as a post-hoc filter, future architectures should optimise for predicted harm reduction during training itself, e.g. via safety-potential reward shaping.
- Sim-to-real safety transfer. Recent work shows that pessimistic domain randomisation can provably transfer safety guarantees — not just policies — from simulation to real hardware, opening a principled path to zero-shot safe deployment.
6. Cite this work
@article{dacosta2026predictivesafety,
title = {A Survey on Predictive Safety in Embodied AI},
author = {da Costa, Kleyton and Koshiyama, Adriano and
Kanoulas, Dimitrios and Treleaven, Philip},
year = {2026},
url = {https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6562019}
}