IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity

Abstract

Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions.

We present IR², a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR² allows robots to effectively reason about the longerterm trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6−34.1% shorter exploration paths and significantly improved mapped area consistency among robots when compared to SOTA baselines.

Approach

IR² propose a novel information-sharing strategy that achieves high exploration efficiency by estimating the future impact of current exploration and rendezvous decisions. There are three key contributions to our proposed work: (1) We use an attention-based neural network trained by deep reinforcement learning (DRL) to help robots learn to sequence non-myopic decisions. (2) We implement two-stage curriculum learning, where robots are placed in increasingly difficult exploration environments with increasing frequency and duration of disconnectivity. This drives robots to learn complex, dynamic connectivity strategies to attain even higher exploration efficiency. (3) We utilize a hierarchical graph formulation, to enable scaling of our strategy to large-scale environments. This involves maintaining both a sparse global graph representation of the robots’ map and a dense local graph centered on the robot. Combining graphs at different spatial scales helps robots strike a balance between long- and short-term exploration and rendezvous goals.

Python Training

We develop a 2-stage curriculum to allow robots to learn complex information-sharing strategies incrementally. The first stage train robots on basic exploration skills such as moving to frontiers efficiently, while the second stage train robots to handle situations with prolonged disconnectivity. Robots exchange map, graph, and position information when they are within communication range.

Gazebo Simulator Baseline Comparison

We compare IR² with a pursuit-based approach (Pursuit) and a preplanned-based approach (Preplanned) in the Indoor (130m x 100m) Gazebo environment. In general, IR² outperforms these baselines, and improves in both distance and time efficiency with increasing number of robots. Our approach generalizes to both proximity and signal-strength communication constraints.

Other Environments

IR² performs well in Forest (150m x 150m) and Campus (340m x 340m) Gazebo environments.

BibTeX

@INPROCEEDINGS{derek2024IR2,
      author={Tan, Derek Ming Siang and Ma, Yixiao and Liang, Jingsong and Chng, Yi Cheng and Cao, Yuhong and Sartoretti, Guillaume},
      booktitle={2024 IEEE International Conference on Intelligent Robots and Systems (IROS)}, 
      title={IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity}, 
      year={2024},
      doi={10.48550/arXiv.2409.04730}
    }