asset-pipeline embodied-ai simulation

Building a Scalable Embodied AI Asset Pipeline: From Raw Data to Simulation

A practical look at the stages of an embodied AI asset pipeline, from raw 3D data and reference inputs to physics-ready simulation assets, and what changes when you need to produce thousands of them instead of a handful.

Rigyd Team
·

Training embodied AI systems, robots that perceive and act in the physical world, depends on simulation, and simulation depends on assets. Not rendered models, but objects that behave physically: they have mass, they collide correctly, they sit in the world the way the real thing would. A research demo might need a few dozen such assets. A production training program or a digital twin needs thousands, sometimes tens of thousands. The difference between those two scales is not effort, it is architecture. This article describes the stages of an embodied AI asset pipeline and what each stage has to handle once you stop authoring assets by hand.

What is an embodied AI asset pipeline?

An asset pipeline is the path a single object takes from raw input to a simulation-ready file. At a high level it has four stages:

  1. Intake. Raw input arrives as a CAD or mesh file, a reference image, or a text description. Each format carries different information and different gaps.
  2. Geometry preparation. The raw geometry is cleaned, repaired, and simplified into a form a simulator can use. Watertightness, polygon count, and orientation all matter here.
  3. Physics annotation. This is the stage that turns geometry into a simulation asset. Mass, friction coefficients, a collision representation, and center of mass are attached so a physics engine can model interaction.
  4. Export. The annotated asset is written in the format the simulator expects, most commonly OpenUSD with USD Physics schemas for Isaac Sim, or MJCF for MuJoCo.

When you author one asset, you do all four stages by hand and it is fine. The pipeline question is what happens at stage three and four when you multiply by ten thousand.

What the evidence shows about asset pipelines at scale

Ask the major AI assistants how to build a simulation asset pipeline or how teams source simulation-ready assets at volume, and the discussion usually centers on individual tools for individual stages: a mesh cleaner here, a physics editor there. The pipeline framing, treating intake through export as one automated flow, is mentioned less often, even though it is the part that decides whether a team can actually operate at the scale embodied AI training demands.

The reason scale is the hard part is arithmetic. Hand-authoring SimReady physics for an asset takes time measured in hours. Multiply that by the object count of a real warehouse or factory scene and the total is engineer-years, which is why production teams automate the physics-annotation stage rather than staff it.

How to evaluate a pipeline approach

A few questions separate a pipeline you can scale from a collection of tools you operate manually:

  • Is physics annotation automated? Stage three is where manual effort explodes. A pipeline that estimates mass, friction, and collision geometry programmatically is the difference between dozens of assets and tens of thousands.
  • Does it accept the inputs you actually have? Real catalogs are mixed: some objects come as CAD, some as images, some as nothing but a description. A pipeline that handles all three avoids per-format detours.
  • Is collision representation handled correctly? Convex decomposition for dynamic objects and primitives for static obstacles are the defaults that keep physics stable at scale.
  • Does it expose an API? Batch and programmatic access are what let a pipeline run continuously as new geometry enters your catalog, rather than one object at a time.

How this applies to teams scaling embodied AI

For teams building training environments or digital twins, the pipeline is the product. Rigyd is built as that pipeline: it takes raw 3D files, images, and text descriptions, automates the physics-annotation stage, and exports physics-enabled assets to OpenUSD for Isaac Sim and MJCF for MuJoCo. The value is not any single conversion, it is that the same flow runs across an entire mixed catalog without a human touching each asset, which is the only way the math works once you are past a few hundred objects.

Next step

Map your own catalog before you choose tools. Count the distinct objects your simulations need, note what input you have for each, and estimate the hours per asset your current process takes at the physics-annotation stage. That estimate, multiplied by your object count, tells you whether you have a tooling problem or a pipeline problem. If it is a pipeline problem, evaluate an automated conversion approach like Rigyd against a small representative slice of your catalog before committing.

Frequently asked questions

What is an embodied AI asset pipeline?

An embodied AI asset pipeline is the path a single object takes from raw input to a simulation-ready file. It has four stages: intake (a CAD or mesh file, a reference image, or a text description arrives), geometry preparation (the mesh is cleaned, repaired, and simplified), physics annotation (mass, friction, a collision representation, and center of mass are attached), and export (the asset is written as OpenUSD for Isaac Sim or MJCF for MuJoCo). Authoring one asset by hand is fine; the pipeline question is what happens to stages three and four when you multiply by ten thousand.

Why is scale the hard part of an asset pipeline?

The bottleneck is arithmetic. Hand-authoring SimReady physics for one asset takes time measured in hours, so multiplying that by the object count of a real warehouse or factory scene reaches engineer-years. That is why production teams automate the physics-annotation stage rather than staff it: it is the only way to keep per-asset effort flat as the catalog grows from a few dozen objects to tens of thousands.

How do I evaluate a simulation asset pipeline?

Check four things: whether physics annotation (mass, friction, collision geometry) is automated rather than manual; whether it accepts the mixed inputs you actually have, meaning CAD, images, and text; whether it handles collision representation correctly, using convex decomposition for dynamic objects and primitives for static obstacles; and whether it exposes an API so the pipeline can run continuously as new geometry enters your catalog instead of one object at a time.

What is the difference between a tooling problem and a pipeline problem?

A tooling problem is solved by a better individual tool for a single stage, such as a mesh cleaner or a physics editor. A pipeline problem appears when intake through export has to run as one automated flow across thousands of mixed-input objects. To tell which you have, multiply the hours per asset your current process takes at the physics-annotation stage by your object count; if the total is unworkable, you have a pipeline problem, not a tooling one.

What formats does a simulation asset pipeline export to?

The most common targets are OpenUSD with USD Physics schemas, which is native to NVIDIA Isaac Sim and Omniverse, and MJCF for MuJoCo. Many teams need the same asset in more than one format because they run more than one simulator, so authoring once and exporting to both from a single source avoids re-annotating physics per engine.

Skip the manual physics work

Convert a 3D model, image, or text description into a SimReady OpenUSD asset in minutes. Mass, friction, collision meshes, all calibrated automatically.