# First Physics Audit of Open-X Embodiment — 216 Episodes, 78.1% Pass Rate
I built a tool that applies biomechanical physics laws to sensor data before training.
No ML. No learned classifier. Just equations — F=ma coupling, rigid-body kinematics,
jerk bounds, Hurst persistence.
I ran it on RoboTurk from Open-X Embodiment. Here is what came out.
-–
## The Audit
**Dataset:** RoboTurk (Open-X Embodiment, Stanford)
**Episodes:** 216 human-teleoperated demonstrations
**Windows certified:** 1,143
| Tier | Count | % |
|------|-------|—|
| GOLD | 284 | 24.8% |
| SILVER | 609 | 53.3% |
| BRONZE | 242 | 21.2% |
| REJECTED | 8 | 0.7% |
**Pass rate (GOLD+SILVER): 78.1%**
**Top failing law: `imu_internal_consistency` — 32.4% of windows**
-–
## What the Finding Means
`imu_internal_consistency` checks that translational acceleration and rotational
acceleration are physically coupled — as they are in real human motion.
In RoboTurk, `world_vector` (translation) and `rotation_delta` (rotation) are
commanded through separate channels in the smartphone teleoperation interface.
They have different latencies. S2S detects this mismatch.
This is not a bug in the data. It is a measurable property of the teleoperation
interface — and S2S quantifies it. 32.4% of windows have translational and
rotational commands that are physically inconsistent with each other.
For robot training: a model trained on these windows learns motion where the
hand translation and wrist rotation are decoupled. That is not how humans move.
-–
## Comparison to Real Human IMU
| Dataset | Pass Rate | Top Law |
|---------|-----------|---------|
| NinaPro DB5 (real human, 2000Hz) | 100% SILVER | none |
| RoboTurk (teleoperation, 15Hz) | 78.1% | imu_consistency 32.4% |
Real human IMU passes everything. Teleoperation data has a measurable quality gap.
-–
## Reproduce It
```bash
pip install s2s-certify
cd S2S && python3 certify_roboturk.py
```
Full audit data: Scan2s/s2s-certified-motion · Datasets at Hugging Face
-–
This is the first physics audit of Open-X Embodiment I am aware of.
If anyone has run similar analysis on other Open-X subsets I would like to know.
The tool works on any IMU/EMG dataset. Zero dependencies. Pure Python.