Evaluation Framework Overview¶
navarena-bench is a navigation model evaluation framework based on 3D Gaussian Splatting and occupancy grids. It provides a modular, extensible evaluation system supporting multiple navigation tasks and agent types.
Prerequisites
Before running evaluation, prepare: ① V1 format scene assets (under $NAVARENA_DATA_DIR/assets/); ② Episode data conforming to the evaluation data format.
Core Features¶
The evaluation framework provides:
- Modular Design - Registration-based architecture; supports extension with new environments, tasks, evaluators, agents, and metrics
- 3D GS Rendering - Scene rendering via gsplat
- Collision Detection - Collision detection based on occupancy grid map
- Multi-Task Support - PointNav, ObjectNav, ImageNav, VLN
- Multi-Agent Support - Local, Remote, ViNT, GNM, NoMaD, MultiModalNav, LanguageNav
- Replay Visualization - Evaluation result replay and visualization
Architecture¶
flowchart TB
subgraph Core[Core Modules]
Eval[Evaluator]
Env[Environment]
Agent[Agent]
Dataset[Dataset]
Metrics[Metrics]
end
subgraph Tasks[Task Types]
PN[PointNav]
ON[ObjectNav]
IN[ImageNav]
VLN[VLN]
end
subgraph Agents[Agent Types]
Local[Local Agent]
Remote[Remote Agent]
ViNT[ViNT Agent]
GNM[GNM Agent]
NoMaD[NoMaD Agent]
MultiModal[MultiModalNav Agent]
LangNav[LanguageNav Agent]
end
subgraph Envs[Environment Types]
GS[3D GS Environment]
end
Eval --> Env
Eval --> Agent
Eval --> Dataset
Eval --> Metrics
Tasks --> Eval
Agents --> Agent
Envs --> Env Core Components¶
Evaluator¶
The evaluator coordinates the environment, agent, and dataset, runs the evaluation loop, and aggregates metrics.
Supported evaluators: - PointNavEvaluator - Point goal navigation - ObjectNavEvaluator - Object goal navigation - ImageNavEvaluator - Image goal navigation - VLNEvaluator - Vision-Language Navigation
Environment¶
The environment provides scene rendering (3D GS), occupancy grid collision detection, robot state management, and goal validation.
Supported environments: - GaussianSplattingEnv - 3D GS environment
Agent¶
The agent encapsulates the navigation model interface, receiving observations and producing actions.
Supported agents: - LocalAgent - Local model - RemoteAgent - Remote HTTP service - ViNTAgent - ViNT model - GNMAgent - GNM model - NoMaDAgent - NoMaD model - MultiModalNavAgent - Multi-modal navigation (language/image/object goals) - LanguageNavAgent - Language navigation (Voronoi-based planning)
Dataset¶
The dataset module loads, validates, and iterates evaluation episode data.
Supported datasets: - EpisodeDataset - Episode format dataset
Metrics¶
The metrics module computes SR (Success Rate), SPL (Success weighted by Path Length), and NE (Navigation Error).
Supported metrics: - NavigationMetrics - Navigation metrics
Data Flow¶
sequenceDiagram
participant Eval as Evaluator
participant Dataset as Dataset
participant Env as Environment
participant Agent as Agent
participant Metrics as Metrics
Eval->>Dataset: Load Episode
Eval->>Env: Reset environment
Eval->>Agent: Reset agent
loop Each Step
Env->>Agent: Observation
Agent->>Env: Action
Env->>Env: Update state
Env->>Eval: Env info
end
Eval->>Metrics: Compute metrics
Metrics->>Eval: Return results
Eval->>Eval: Save results Registration Mechanism¶
The framework uses a decorator registration mechanism to support extension:
Register Environment¶
from navarena_bench.env.base import Env
@Env.register("my_env")
class MyEnvironment(Env):
def __init__(self, env_config, task_config):
super().__init__(env_config, task_config)
# ... implement interface methods
Register Agent¶
from navarena_bench.agent.base import Agent
@Agent.register("my_agent")
class MyAgent(Agent):
def __init__(self, config):
super().__init__(config)
# ... implement interface methods
Register Evaluator¶
from navarena_bench.evaluator.base import Evaluator
@Evaluator.register("my_eval")
class MyEvaluator(Evaluator):
def __init__(self, config):
super().__init__(config)
# ... implement interface methods
Episode Data Format¶
The framework uses a standard Episode JSON format:
{
"episodes": [
{
"episode_id": "train_000001",
"scene_path": "x2robot/17dc3367",
"task_type": "pointnav",
"start_state": {
"position": [0.0, 0.0, 0.0],
"rotation": [0.0, 0.0, 0.0, 1.0]
},
"goals": [
{
"goal_type": "position",
"position": [5.0, 0.0, 0.0],
"rotation": [0.0, 0.0, 0.383, 0.924]
}
]
}
]
}
Field descriptions: - episode_id: Episode unique identifier - scene_path: Scene path (relative to $NAVARENA_DATA_DIR/assets/) - task_type: Task type (pointnav | imagenav | objectnav | vln) - start_state: Start state, quaternion format [qx, qy, qz, qw] - goals: Goal list, must include goal_type (position | image | object)
Evaluation Configuration¶
Evaluation is configured via YAML:
eval_type: "pointnav"
env:
env_type: "gs"
env_settings:
camera_config: "${NAVARENA_DATA_DIR}/shared/camera.yaml"
enable_occupancy: true
success_distance: 0.5
agent:
agent_type: "local"
model_settings: {}
device: null
task:
task_type: "pointnav"
dataset:
dataset_type: "episode"
dataset_path: "$NAVARENA_DATA_DIR/datasets/navarena_dataset_v1/x2robot/17dc3367/pointnav"
eval_settings:
num_episodes: 100
output_path: "./eval_results"
max_steps_per_episode: 500
Usage Flow¶
1. Prepare Data¶
Create an episode dataset in the correct format.
2. Configure Evaluation¶
Edit the evaluation configuration file.
3. Run Evaluation¶
4. View Results¶
Results are saved in the output directory: - Evaluation metrics JSON - Trajectory data (optional) - Replay data (optional)
5. Generate Replay¶
Extensibility¶
Each component is extensible via the registration mechanism:
- New environment: Subclass
Envand register - New agent: Subclass
Agentand register - New evaluator: Subclass
Evaluatorand register - New metric: Subclass
Metricand register - New replayer: Subclass
BaseReplayerand register
See also: Environment · Agents · Evaluators · Replay · Extending - Learn how to Extend the Framework