Skip to content

Agent Module

The agent module is the interface between navigation models and the evaluation framework, receiving observations and producing navigation actions. Supports local models, remote services, and pre-trained models.

Agent Types

LocalAgent

Local model agent that loads model files directly.

Configuration

agent:
  agent_type: "local"
  model_settings:
    checkpoint_path: "/path/to/model.pth"
  device: null  # null = auto-detect

Usage Example

from navarena_bench.agent import Agent
from navarena_bench.configs.agent_config import AgentCfg

config = AgentCfg(
    agent_type="local",
    model_settings={"checkpoint_path": "/path/to/model.pth"}
)

agent = Agent.init(config)

RemoteAgent

Remote service agent that invokes a remote model via HTTP API.

Configuration

agent:
  agent_type: "remote"
  model_settings:
    remote_url: "http://localhost:8000/api/v1/navigate"
    remote_timeout: 30.0
    remote_retries: 3

API Interface Format

Request:

{
  "observation": {
    "rgb": {
      "face": "base64_encoded_image",
      "left": "base64_encoded_image",
      "right": "base64_encoded_image"
    },
    "position": [0.0, 0.0, 0.0],
    "rotation": [1.0, 0.0, 0.0, 0.0]
  },
  "goal": {
    "position": [5.0, 0.0, 0.0]
  }
}

Response:

{
  "action": {
    "x": 0.5,
    "y": 0.0,
    "yaw": 0.1
  }
}

Usage Example

config = AgentCfg(
    agent_type="remote",
    model_settings={
        "remote_url": "http://localhost:8000/api/v1/navigate",
        "remote_timeout": 30.0,
        "remote_retries": 3,
    }
)

agent = Agent.init(config)

ViNTAgent

ViNT (Visual Navigation Transformer) model agent.

Configuration

agent:
  agent_type: "vint"
  model_settings:
    checkpoint_path: "/path/to/checkpoint.pth"
    config_path: "/path/to/config.yaml"  # optional
    device: "cuda:0"  # optional

Usage Example

config = AgentCfg(
    agent_type="vint",
    model_settings={
        "checkpoint_path": "/path/to/checkpoint.pth"
    }
)

agent = Agent.init(config)

Dependencies

ViNT agent requires the visualnav-transformer project.

GNMAgent

GNM (General Navigation Model) agent.

Configuration

agent:
  agent_type: "gnm"
  model_settings:
    checkpoint_path: "/path/to/checkpoint.pth"

NoMaDAgent

NoMaD (Goal Masking Diffusion Policies for Navigation and Exploration) agent.

Configuration

agent:
  agent_type: "nomad"
  model_settings:
    checkpoint_path: "/path/to/checkpoint.pth"

MultiModalNavAgent

Multi-modal navigation agent supporting language, image, and object goal types.

Configuration

agent:
  agent_type: "multimodal_nav"
  model_settings:
    checkpoint_path: "/path/to/model.pth"
    input_modalities: ["rgb", "depth"]  # optional
    fusion_method: "concat"  # optional

LanguageNavAgent

Language navigation agent for VLN tasks, using Voronoi graph for path planning and exploration.

Configuration

agent:
  agent_type: "language_nav"
  model_settings:
    waypoint_tolerance: 0.3
    max_v: 0.5
    max_w: 1.0
    voronoi_closeness: 0.5
    min_voronoi_distance: 0.3

Agent Interface

All agents implement:

reset()

Reset agent state.

episode = {
    "episode_id": "001",
    "goals": [{"position": [5.0, 0.0, 0.0]}]
}

agent.reset(episode)

act()

Generate action from observation.

observation = {
    "rgb": {
        "face": np.array(...),  # RGB image
        "left": np.array(...),
        "right": np.array(...)
    },
    "position": [0.0, 0.0, 0.0],
    "rotation": [1.0, 0.0, 0.0, 0.0]
}

action = agent.act(observation)
# {
#     "x": 0.5,      # Forward distance (meters)
#     "y": 0.0,      # Lateral distance (meters)
#     "yaw": 0.1     # Rotation angle (radians)
# }

close()

Close agent and release resources.

agent.close()

Action Format

All agents return actions in this format:

{
    "x": float,    # Forward/backward (meters), positive=forward
    "y": float,    # Lateral (meters), positive=right
    "yaw": float   # Rotation (radians), positive=CCW
}

Observation Format

Observations received by agents:

{
    "rgb": {
        "camera_name": np.ndarray  # RGB, shape (H, W, 3)
    },
    "depth": {  # optional
        "camera_name": np.ndarray  # Depth, shape (H, W)
    },
    "position": [x, y, z],
    "rotation": [w, x, y, z]  # quaternion
}

Agent Comparison

Agent Type Use Case Pros Cons
LocalAgent Local models Fast, no network latency Requires model file
RemoteAgent Remote services Flexible, easy deploy Network latency
ViNTAgent Image goal nav Pre-trained model Extra dependencies
GNMAgent General nav Pre-trained model Extra dependencies
NoMaDAgent General nav Pre-trained model Extra dependencies
MultiModalNavAgent Multi-modal input Language/image/object goals Complex config
LanguageNavAgent VLN Voronoi path planning Requires language model

Custom Agents

Implement Custom Agent

from navarena_bench.agent.base import Agent
from navarena_bench.configs.agent_config import AgentCfg

@Agent.register("my_agent")
class MyAgent(Agent):
    def __init__(self, config: AgentCfg):
        super().__init__(config)
        # Initialize model

    def reset(self, episode=None):
        """Reset agent state"""
        # Reset logic
        pass

    def act(self, observation):
        """Generate action"""
        # Action generation logic
        return {
            "x": 0.5,
            "y": 0.0,
            "yaw": 0.1
        }

    def close(self):
        """Release resources"""
        # Cleanup logic
        pass

Use Custom Agent

agent:
  agent_type: "my_agent"
  model_settings:
    checkpoint_path: "/path/to/model.pth"
# Import custom agent so it registers
import my_agent_module

agent = Agent.init(config)

FAQ

Model load failed

Check model path, ensure file exists and format is correct.

Remote service timeout

Increase remote_timeout or check network connection.

Action format error

Ensure action dict includes x, y, yaw fields.

Observation format mismatch

Check that env observations match the agent’s expected format.

See also: Evaluator Module · Replay Module · Extending