Demo

Overview

Object-Goal Navigation (ObjectNav) requires an agent to autonomously explore an unknown environment and navigate toward target objects specified by a semantic label. While prior work has primarily studied zero-shot ObjectNav under 2D locomotion, extending it to aerial platforms with 3D locomotion capability remains underexplored. Aerial robots offer superior maneuverability and search efficiency, but they also introduce new challenges in spatial perception, dynamic control, and safety assurance. In this paper, we propose AION for vision-based aerial ObjectNav without relying on external localization or global maps. AION is an end-to-end dual-policy reinforcement learning (RL) framework that decouples exploration and goal-reaching behaviors into two specialized policies. We evaluate AION on the AI2-THOR benchmark and further assess its real-time performance in IsaacSim using high-fidelity drone models. Experimental results show that AION achieves superior performance across comprehensive evaluation metrics in exploration, navigation efficiency, and safety.

Details

1. Task

Indoor object-goal navigation for UAVs with 3D locomotion: the drone must autonomously explore an unknown environment and navigate toward a target object specified by a semantic label (e.g., “laptop”, “microwave”), without any prior map or external localization.

2. Framework

A dual-policy RL framework that switches between two modes based on target visibility:

  • Exploration Mode — maximize spatial coverage in unknown space
  • Goal-Reaching Mode — visual servoing toward the detected target object
AION dual-policy architecture
Depth-based ROI extraction

3. Exploration Mode

Input: Depth map + ROI (Region of Interest). The ROI identifies open, navigable areas in the depth image — simulating how humans instinctively look toward open spaces when navigating. The ROI is extracted using OpenCV-based methods and provides a directional cue (centroid position \((d_x, d_y)\) and mean depth \(\bar{z}\)), rather than absolute unknown-space information.

Rewards:

\[r_t^E = R_{forward} + R_{center} + R_{safe}\]
  • \(R_{forward}\): reward for moving toward open space
  • \(R_{center}\): penalty for yaw deviation from ROI centroid
  • \(R_{safe}\): collision / obstacle proximity penalty

4. Goal-Reaching Mode

Input: RGB image + frozen CLIP text embedding (aligns text and visual features for zero-shot object recognition) + object/class bounding box.

Rewards:

\[r_t^G = R_{dist} + R_{bbox} + R_{parent} + R_{suc} - R_{collision}\]
  • \(R_{dist}\): reward for reducing Euclidean distance to target
  • \(R_{bbox}\): reward for centering and enlarging the target bounding box in the field of view (indicates approaching the object)
  • \(R_{parent}\): parent-class reward — e.g., reaching a desk earns partial reward if the target is a laptop on that desk
  • \(R_{suc}\): task success reward
  • \(R_{collision}\): collision penalty

5. Action Space

Discrete 3D actions — forward, turn left/right, ascend, descend, etc.

6. Evaluation

Evaluated on two simulators: AI2-THOR (standard benchmark with seen/unseen object splits) and IsaacSim (larger multi-room environments where the target may be in a different room).

AI2-THOR Benchmark
Model Split Seen SR SPL Unseen SR SPL
BaseModel 18/4 76.7 39.9 81.5 36.4
Scene Prior 18/4 74.3 42.1 83.7 41.9
MJO 18/4 81.2 52.0 90.7 51.7
SSNet 18/4 72.3 50.4 77.8 50.0
Ours 18/4 88.7 57.9 95.0 55.2
BaseModel 14/8 73.3 47.3 70.8 46.6
Scene Prior 14/8 79.3 52.7 71.0 44.8
MJO 14/8 78.8 43.6 83.0 45.6
SSNet 14/8 79.2 44.3 81.8 46.4
Ours 14/8 84.7 61.2 87.0 60.5
SR = Success Rate (%), SPL = Success weighted by Path Length (%)
IsaacSim Cross-Scene
Algorithm Object Chem. Beech. Ihlen
Exp+MJO Sofa 3/5 4/5 4/5
Plant 2/5 5/5 5/5
Laptop 0/5 3/5 5/5
Microwave 2/5 5/5 2/5
Exp+SSNet Sofa 3/5 4/5 3/5
Plant 3/5 2/5 3/5
Laptop 0/5 3/5 5/5
Microwave 1/5 5/5 3/5
AION Sofa 4/5 4/5 5/5
Plant 5/5 5/5 4/5
Laptop 2/5 5/5 5/5
Microwave 3/5 5/5 5/5
SR = Success Rate (successes / 5 trials)
IsaacSim scenes and target objects
Exploration trajectories in Beechwood

Resources