SimReady assets for
evaluation and testing

Stand up held-out evaluation catalogs, regression-test suites, and safety scenarios that match the real environment your robot will deploy into. Rigyd generates the asset diversity that makes policy evaluation honest.

The problem

Why existing workflows fall short.

Evaluating on the training distribution is not evaluation

Policies trained on a fixed asset set evaluated on the same fixed asset set overstate real-world performance. Honest evaluation needs a held-out catalog the policy never saw at training, which is exactly what most teams cannot afford to build by hand.

Regression testing breaks at the asset layer

When the asset library shifts (new SKUs, geometry updates, calibration changes), the policy metrics shift with it, and there is no clean way to attribute the change to the policy versus the assets. Regression tests need a stable asset baseline.

Safety scenarios need the long tail

Edge-case and adversarial assets, oddly shaped, unexpectedly heavy, slippery, fragile, are the hardest to author by hand and the most important for safety testing. Generic libraries skip them; manual authoring runs out of patience before it runs out of edge cases.

How Rigyd helps

AI-native infrastructure that automates the hard parts.

Bulk generation of held-out evaluation catalogs

Generate 10,000+ asset evaluation catalogs in hours rather than months, sampled from a different distribution than the training catalog so the held-out evaluation is genuinely held out. Calibration matches the training distribution inside DR variance, so the metric is comparable.

Versioned asset baselines for regression

Enterprise API pins each generated asset to a content hash and a Rigyd version. Regression suites reference the hash; a deliberate asset-library refresh is its own pull request, and policy-metric changes are attributable to either the policy or the assets, never both at once.

Long-tail edge-case generation

Text-to-asset and image-to-asset paths produce edge cases that would take days to author manually: a 10kg metal handle on a plastic mug, a wet-floor friction-0.02 tile, a fragile container with a low-density centre of mass. Edge-case suites become a deliberate design exercise instead of a manual-authoring bottleneck.

10,000+

asset evaluation catalogs built in hours

content-hashed

asset versioning for regression baselines

~5 min

per asset, including edge-case and adversarial generation

Build an evaluation catalog as serious as your training catalog

Generate held-out evaluation assets, regression baselines, and edge-case suites in hours rather than weeks.

Starts at $29/month. 30 credits included.

Frequently asked questions

How do I build a held-out evaluation catalog that is actually held out?

Generate the training catalog and the evaluation catalog as two distinct Rigyd batches with different source inputs: training from one set of CAD exports, images, or text prompts; evaluation from a disjoint set. Both batches use the same calibration model so the metric is comparable, but the assets the policy sees at training time are disjoint from the assets it sees at evaluation. The Enterprise API surfaces both as separate versioned collections.

Can I use the same asset set for both training and regression testing?

Yes, with one constraint: pin the asset hash. Rigyd emits a content hash per asset; lock the regression suite to those hashes so the asset baseline is stable across CI runs. When you deliberately refresh the asset library (new generation, new calibration model), it is its own change with its own diff, and policy regressions are attributable to either the policy update or the asset update, never silently both.

How do I generate safety and edge-case scenarios?

Use the text-to-asset path with explicit edge-case prompts: "a wet, oily floor tile with friction below 0.05", "a top-heavy container with the centre of mass at 80% of the height", "a fragile glass cylinder with brittle restitution". Each prompt produces a SimReady asset with the requested physics in the manipulation- or navigation-relevant ranges, ready to drop into your safety-evaluation suite. Image-to-asset works the same way for visual edge cases (camouflage textures, unusual shapes, partial occlusions).

Does evaluation accuracy depend on Rigyd's estimates being exact?

No, and that is the point. Evaluation only needs the asset distribution to match what the policy will see in deployment. Rigyd's calibration sits within the DR variance bands most training pipelines already use (mass ±15-20%, friction ±0.1), which is the same band the deployment environment falls within. Where you have measured ground truth (catalog masses, lab friction tests), override and lock those values; for the long tail of evaluation assets, AI calibration is inside the bracket honest evaluation needs.

Which simulators are these evaluation catalogs valid for?

All of them. OpenUSD output drops into Isaac Sim, Isaac Lab, Omniverse, Unreal Engine, Unity, and Gazebo Sim via USD imports. MJCF output drops into MuJoCo, MJX, and Genesis. The same evaluation catalog can be replayed across multiple simulators if your evaluation methodology compares simulator behaviour. The asset distribution is identical; only the physics engine differs.