HiBerNAC: Hierarchical Brain-emulated Robotic Neural Agent Collective for Disentangling Complex Manipulation

Our brain-inspired planning and manipulation framework models neural cognition structure as a multimodal multiagent system, structured to emulate neural specialization from the human brain, as illustrated in the figure. For complex and long-horizon tasks, the system dynamically activates distinct agents for processing language, vision, and episodic memory inputs. These agents function analogously to sensory cortices, extracting salient features and contextual information. The fused multimodal representations are routed to a central planning module inspired by the prefrontal cortex, where high-level decision-making and task decomposition occur. This output is further refined through a correction mechanism modeled on the inferior olivary nucleus, introducing predictive error feedback for robust adaptation. In contrast, simple or reactive tasks bypass high-level planning and instead utilize a streamlined single-agent pathway akin to spinal or reflex arcs in biological systems. These rely on a 'past work memory' module that recalls previously successful execution patterns, slightly modified to fit current contexts, ensuring both rapid response and task relevance. Task routing between these two pathways is governed by a context-aware task classifier that estimates cognitive load and task complexity. This architecture supports real-time adaptability and efficiency, integrating deliberative and reflexive behaviors. The structure graph accompanying this section outlines these modules and their interactions, reflecting both the functional and anatomical parallels to the human brain. The system's implementation integrates three interacting components: (1) a multi-agent neural structure for high-level planning, (2) an asynchronous pipeline for hierarchical task management, and (3) a reactive VLA system for real-time control execution.

Metrics for Real-to-Sim Evaluation

An effective & useful simulation-based evaluation should demonstrate good correlations in policy ranking & performance with real evaluations.

To measure such correlations, one can apply the traditional Pearson correlation metric ("r"), but it has the following limitations: (1) Pearson correlation only assess the linear fit between real-and-sim performances, while for simulated evaluation we don't necessarily need linear correlations, as long as sim eval reflects real-world performance improvements between different policies (middle-right); (2) Pearson correlation does not reflect the range of values it is computed over. For policy sets that perform closely in real (far-right), Pearson r may change drastically based on small real-world performance differences, which can often be attributed to the inherent noise in real-world evaluations.

Thus, we introduce the Mean Maximum Rank Violation (MMRV) metric (lower the better) to better assess the real-and-sim policy ranking consistency. The key underlying quantity is the rank violation between two policies, which weighs the significance of the simulator incorrectly ranking the policies by the corresponding margin in real-world performance. MMRV then aggregates the N^2 rank violations by averaging the worst-case rank violation for each policy.

Visual Matching Mitigates the Real-to-Sim Visual Gap

Visual discrepancies between real-world and simulated environments can comprise a distribution shift that adversely affects a learned policy’s behavior, rendering simulated evaluation unreliable. Our goal is to match the simulator visuals to those of the real-world environment with only a modest amount of manual effort. Our proposed Visual Matching consists of (1) green screening, i.e. segmenting out interactive simulated assets and overlaying them onto real-world backgrounds; and (2) texture matching, which involves projecting real object textures onto simulation assets and tuning robot arm colors using real videos.

System Identification Mitigates the Real-to-Sim Control Gap

The goal of mitigating the control gap between simulated and real-world environments is to ensure that policy actions executed in simulation yields comparable effects on the robot’s end-effector as those observed when executed on the real robot. We perform system identification (SysID) for closing the control gap between real and simulated environments on a small sample of trajectories from the real world dataset.

Real World Rollout

Control without SysID

Control with SysID

HiBerNAC:
An Open-Source Vision-Language-Action Model

We present HiBerNAC a Hierarchical Brain-emulated robotic Neural Agent Collective that combines: (1) multimodal VLA planning and reasoning with (2) neuro-inspired reflection and multi-agent mechanisms, specifically designed for complex robotic manipulation tasks.

The HiBerNAC Model

Metrics for Real-to-Sim Evaluation

Visual Matching Mitigates the Real-to-Sim Visual Gap

System Identification Mitigates the Real-to-Sim Control Gap

HiBerNAC:An Open-Source Vision-Language-Action Model

We present HiBerNAC a Hierarchical Brain-emulated robotic Neural Agent Collective that combines: (1) multimodal VLA planning and reasoning with (2) neuro-inspired reflection and multi-agent mechanisms, specifically designed for complex robotic manipulation tasks.

The HiBerNAC Model

Metrics for Real-to-Sim Evaluation

Visual Matching Mitigates the Real-to-Sim Visual Gap

System Identification Mitigates the Real-to-Sim Control Gap

HiBerNAC:
An Open-Source Vision-Language-Action Model