NuRec: Bridging the Visual Fidelity Gap in Simulation

We recently had some fun exploring whether you can 3D map the Ekumen HQ using just a smartphone.

That drive to effortlessly digitize physical spaces stems from a massive bottleneck in robotics: bringing the real world into simulation. Traditionally, building these environments requires heavy approximation. When validating perception-heavy pipelines, the gap between handcrafted 3D meshes and the physical world becomes a critical issue. While traditional mesh environments excel at calculating physical interactions, they are computationally expensive to design from scratch and consistently fail to deliver the photorealistic visual fidelity required to rigorously test modern, AI-driven computer vision systems.

The Baseline: Understanding NVIDIA NuRec

NVIDIA’s NuRec (Neural Reconstruction) is engineered to solve this exact issue. By ingesting real-world camera and LiDAR data to generate environments based on 3D Gaussian models, NuRec replaces synthetic approximations with neurally rendered, highly realistic 3D scenes natively within Omniverse.

Under the hood, NuRec divides its pipeline into two distinct phases to balance accuracy and speed. First, 3DGRUT acts as the reconstruction engine, natively handling complex camera distortions like rolling shutter. Once the scene is trained and exported as a USDZ file, NuRec hands the active simulation rendering to gsplat. This separation leverages 3DGRUT for highly accurate physical modeling during setup, while relying on gsplat’s efficient, CUDA-accelerated rasterization for fast performance at runtime.

To rigorously validate NuRec’s utility, we established a testbed in Isaac Sim 5.1 using pre-reconstructed scenes from the NVIDIA Physical AI dataset on Hugging Face: a pure Gaussian volume environment (zh_lounge), a baseline hybrid scene combining neural rendering with a hidden collision mesh (andoria), and a more complex hybrid scene (living room) alongside supplementary 3D mesh assets.

The Hybrid Architecture: Unlocking Kinematic Interaction

A raw Gaussian volume acts exclusively as a rendering primitive. Because it lacks any underlying geometric mesh, it is completely invisible to the PhysX engine, meaning robots will simply fall through the environment without artificial geometry.

To unlock kinematic interaction, engineers must use a hybrid architecture that intelligently pairs the neural Gaussian volume (dedicated strictly to rendering) with a 3D proxy mesh (dedicated strictly to collision detection).

With the physical interaction validated, we extended the testbed to a full end-to-end perception task. We validated this by establishing a simulation-to-perception data pipeline to run a DINOv3 ROS 2 object detection node within a populated NuRec environment. The model seamlessly identified both the neurally rendered background and independent 3D mesh props. This proves that high-level visual perception and physical interaction can successfully coexist.

Our DINOv3-based ROS 2 package runs object detection across the populated scene.

Figure 1: Object detection across hybrid data types. The DINOv3 ROS 2 node successfully performs object detection on both the neurally rendered background architecture and the explicitly introduced 3D mesh props.

Figure 1: Object detection across hybrid data types

Sensor Validation: The RTX and PhysX Divide

Beyond physical interaction, the hybrid architecture introduces a second constraint: sensor compatibility is tightly coupled to proxy mesh visibility. Simulation sensors do not interpret neural environments uniformly, creating a mandatory trade-off for simulation engineers:

PhysX Sensors: Because these sensors rely entirely on explicit geometric data, they are blind to the Gaussian volume and strictly require the hybrid proxy mesh to function.
RTX Sensors: Functionality is dictated by the collision mesh’s visibility state. When mesh visualization is disabled, the RGB camera natively renders the photorealistic Gaussian volume perfectly, but Depth, Semantic, and RTX LiDAR sensors fail. Conversely, when the mesh is enabled, those specialized sensors function correctly, but the RGB camera output breaks due to overlapping geometric and neural rendering. Engineers must carefully orchestrate proxy mesh visibility to extract the correct data streams.

This video shows the impact of mesh visualization on depth camera detection; without active mesh visibility, the camera cannot detect the objects.

Performance Trade-Offs

While loading these neural volumes introduces a 32% longer initial “warm-up” phase compared to traditional meshes, their active runtime performance remains highly optimized. In our benchmarks, the NuRec setup achieved a mean of 138.89 FPS, slightly edging out the pure mesh environment’s 132.98 FPS (tested on an RTX 4080 GPU Laptop). This proves that achieving neural visual fidelity does not inherently cripple simulation speed. However, adding complex rigid body meshes will compound physics calculations, which could impact this baseline.

Table 1: Performance benchmark comparison: NuRec vs. traditional mesh. This breakdown highlights the core computational trade-offs of the hybrid architecture.

Key Takeaways

The Hybrid Approach is Mandatory: Pure neural volumes lack physics; you must pair them with a hidden 3D proxy mesh to enable collisions and navigation.
Anticipate Sensor Fragmentation: Sensors interpret environments differently. RTX RGB captures the volume, but PhysX, Depth, and LiDAR sensors rely on the geometric mesh, requiring careful visibility management.
Photorealism at a Practical Cost: Despite a longer initialization penalty, the hybrid NuRec architecture delivers real-time execution speeds highly comparable to traditional meshes.
The Blueprint for Physical AI: When configured correctly, this hybrid architecture bridges the historical gap between vision and physics. It provides the exact foundation required to validate complex, end-to-end AI pipelines.

Looking Ahead - Cosmos and Versatile Data Ingestion

NVIDIA’s roadmap promises to further streamline the neural simulation pipeline. The rendering refinement model, currently using the Sana-based Difix3D+, will soon transition to the advanced Cosmos framework.

Additionally, upcoming NCore updates aim to introduce built-in converters for easier ingestion of proprietary fleet recordings and custom data. We’ll keep tracking NuRec’s development as Cosmos and NCore mature. If you’re working with neural simulation environments and want to compare notes, reach out!