Robotics 31
☆ ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks
Unmanned Aerial Vehicles (UAVs) depend on onboard sensors for perception,
navigation, and control. However, these sensors are susceptible to physical
attacks, such as GPS spoofing, that can corrupt state estimates and lead to
unsafe behavior. While reinforcement learning (RL) offers adaptive control
capabilities, existing safe RL methods are ineffective against such attacks. We
present ARMOR (Adaptive Robust Manipulation-Optimized State Representations),
an attack-resilient, model-free RL controller that enables robust UAV operation
under adversarial sensor manipulation. Instead of relying on raw sensor
observations, ARMOR learns a robust latent representation of the UAV's physical
state via a two-stage training framework. In the first stage, a teacher
encoder, trained with privileged attack information, generates attack-aware
latent states for RL policy training. In the second stage, a student encoder is
trained via supervised learning to approximate the teacher's latent states
using only historical sensor data, enabling real-world deployment without
privileged information. Our experiments show that ARMOR outperforms
conventional methods, ensuring UAV safety. Additionally, ARMOR improves
generalization to unseen attacks and reduces training cost by eliminating the
need for iterative adversarial training.
☆ Reinforcement Learning with Physics-Informed Symbolic Program Priors for Zero-Shot Wireless Indoor Navigation
When using reinforcement learning (RL) to tackle physical control tasks,
inductive biases that encode physics priors can help improve sample efficiency
during training and enhance generalization in testing. However, the current
practice of incorporating these helpful physics-informed inductive biases
inevitably runs into significant manual labor and domain expertise, making them
prohibitive for general users. This work explores a symbolic approach to
distill physics-informed inductive biases into RL agents, where the physics
priors are expressed in a domain-specific language (DSL) that is human-readable
and naturally explainable. Yet, the DSL priors do not translate directly into
an implementable policy due to partial and noisy observations and additional
physical constraints in navigation tasks. To address this gap, we develop a
physics-informed program-guided RL (PiPRL) framework with applications to
indoor navigation. PiPRL adopts a hierarchical and modularized neuro-symbolic
integration, where a meta symbolic program receives semantically meaningful
features from a neural perception module, which form the bases for symbolic
programming that encodes physics priors and guides the RL process of a
low-level neural controller. Extensive experiments demonstrate that PiPRL
consistently outperforms purely symbolic or neural policies and reduces
training time by over 26% with the help of the program-based inductive biases.
comment: Spotlight paper at Reinforcement Learning Conference 2025, Workshop
on Inductive Biases in Reinforcement Learning
☆ Robotic Multimodal Data Acquisition for In-Field Deep Learning Estimation of Cover Crop Biomass
Accurate weed management is essential for mitigating significant crop yield
losses, necessitating effective weed suppression strategies in agricultural
systems. Integrating cover crops (CC) offers multiple benefits, including soil
erosion reduction, weed suppression, decreased nitrogen requirements, and
enhanced carbon sequestration, all of which are closely tied to the aboveground
biomass (AGB) they produce. However, biomass production varies significantly
due to microsite variability, making accurate estimation and mapping essential
for identifying zones of poor weed suppression and optimizing targeted
management strategies. To address this challenge, developing a comprehensive CC
map, including its AGB distribution, will enable informed decision-making
regarding weed control methods and optimal application rates. Manual visual
inspection is impractical and labor-intensive, especially given the extensive
field size and the wide diversity and variation of weed species and sizes. In
this context, optical imagery and Light Detection and Ranging (LiDAR) data are
two prominent sources with unique characteristics that enhance AGB estimation.
This study introduces a ground robot-mounted multimodal sensor system designed
for agricultural field mapping. The system integrates optical and LiDAR data,
leveraging machine learning (ML) methods for data fusion to improve biomass
predictions. The best ML-based model for dry AGB estimation achieved a
coefficient of determination value of 0.88, demonstrating robust performance in
diverse field conditions. This approach offers valuable insights for
site-specific management, enabling precise weed suppression strategies and
promoting sustainable farming practices.
comment: Accepted in the Extended Abstract, The 22nd International Conference
on Ubiquitous Robots (UR 2025), Texas, USA
☆ Robust and Accurate Multi-view 2D/3D Image Registration with Differentiable X-ray Rendering and Dual Cross-view Constraints ICRA 2025
Robust and accurate 2D/3D registration, which aligns preoperative models with
intraoperative images of the same anatomy, is crucial for successful
interventional navigation. To mitigate the challenge of a limited field of view
in single-image intraoperative scenarios, multi-view 2D/3D registration is
required by leveraging multiple intraoperative images. In this paper, we
propose a novel multi-view 2D/3D rigid registration approach comprising two
stages. In the first stage, a combined loss function is designed, incorporating
both the differences between predicted and ground-truth poses and the
dissimilarities (e.g., normalized cross-correlation) between simulated and
observed intraoperative images. More importantly, additional cross-view
training loss terms are introduced for both pose and image losses to explicitly
enforce cross-view constraints. In the second stage, test-time optimization is
performed to refine the estimated poses from the coarse stage. Our method
exploits the mutual constraints of multi-view projection poses to enhance the
robustness of the registration process. The proposed framework achieves a mean
target registration error (mTRE) of $0.79 \pm 2.17$ mm on six specimens from
the DeepFluoro dataset, demonstrating superior performance compared to
state-of-the-art registration algorithms.
comment: ICRA 2025
☆ KnotDLO: Toward Interpretable Knot Tying ICRA20243
This work presents KnotDLO, a method for one-handed Deformable Linear Object
(DLO) knot tying that is robust to occlusion, repeatable for varying rope
initial configurations, interpretable for generating motion policies, and
requires no human demonstrations or training. Grasp and target waypoints for
future DLO states are planned from the current DLO shape. Grasp poses are
computed from indexing the tracked piecewise linear curve representing the DLO
state based on the current curve shape and are piecewise continuous. KnotDLO
computes intermediate waypoints from the geometry of the current DLO state and
the desired next state. The system decouples visual reasoning from control. In
16 trials of knot tying, KnotDLO achieves a 50% success rate in tying an
overhand knot from previously unseen configurations.
comment: 4 pages, 5 figures, presented at the Workshop on 3D Visual
Representations for Manipulation at the 2023 IEEE International Conference on
Robotics and Automation in Yokohama, Japan. Video presentation
[https://youtu.be/mg30uCUtpOk]. Poster
[https://hollydinkel.github.io/assets/pdf/ICRA20243DVRM_poster.pdf] 3DVRM
Workshop [https://3d-manipulation-workshop.github.io/]
☆ ASVSim (AirSim for Surface Vehicles): A High-Fidelity Simulation Framework for Autonomous Surface Vehicle Research
Bavo Lesy, Siemen Herremans, Robin Kerstens, Jan Steckel, Walter Daems, Siegfried Mercelis, Ali Anwar
The transport industry has recently shown significant interest in unmanned
surface vehicles (USVs), specifically for port and inland waterway transport.
These systems can improve operational efficiency and safety, which is
especially relevant in the European Union, where initiatives such as the Green
Deal are driving a shift towards increased use of inland waterways. At the same
time, a shortage of qualified personnel is accelerating the adoption of
autonomous solutions. However, there is a notable lack of open-source,
high-fidelity simulation frameworks and datasets for developing and evaluating
such solutions. To address these challenges, we introduce AirSim For Surface
Vehicles (ASVSim), an open-source simulation framework specifically designed
for autonomous shipping research in inland and port environments. The framework
combines simulated vessel dynamics with marine sensor simulation capabilities,
including radar and camera systems and supports the generation of synthetic
datasets for training computer vision models and reinforcement learning agents.
Built upon Cosys-AirSim, ASVSim provides a comprehensive platform for
developing autonomous navigation algorithms and generating synthetic datasets.
The simulator supports research of both traditional control methods and deep
learning-based approaches. Through limited experiments, we demonstrate the
potential of the simulator in these research areas. ASVSim is provided as an
open-source project under the MIT license, making autonomous navigation
research accessible to a larger part of the ocean engineering community.
comment: 14 Pages, 11 Figures
☆ RM-Dijkstra: A surface optimal path planning algorithm based on Riemannian metric
The Dijkstra algorithm is a classic path planning method, which operates in a
discrete graph space to determine the shortest path from a specified source
point to a target node or all other nodes based on non-negative edge weights.
Numerous studies have focused on the Dijkstra algorithm due to its potential
application. However, its application in surface path planning for mobile
robots remains largely unexplored. In this letter, a surface optimal path
planning algorithm called RM-Dijkstra is proposed, which is based on Riemannian
metric model. By constructing a new Riemannian metric on the 2D projection
plane, the surface optimal path planning problem is therefore transformed into
a geometric problem on the 2D plane with new Riemannian metric. Induced by the
standard Euclidean metric on surface, the constructed new metric reflects
environmental information of the robot and ensures that the projection map is
an isometric immersion. By conducting a series of simulation tests, the
experimental results demonstrate that the RM-Dijkstra algorithm not only
effectively solves the optimal path planning problem on surfaces, but also
outperforms traditional path planning algorithms in terms of path accuracy and
smoothness, particularly in complex scenarios.
comment: 7 pages
☆ Evaluating Pointing Gestures for Target Selection in Human-Robot Collaboration
Pointing gestures are a common interaction method used in Human-Robot
Collaboration for various tasks, ranging from selecting targets to guiding
industrial processes. This study introduces a method for localizing pointed
targets within a planar workspace. The approach employs pose estimation, and a
simple geometric model based on shoulder-wrist extension to extract gesturing
data from an RGB-D stream. The study proposes a rigorous methodology and
comprehensive analysis for evaluating pointing gestures and target selection in
typical robotic tasks. In addition to evaluating tool accuracy, the tool is
integrated into a proof-of-concept robotic system, which includes object
detection, speech transcription, and speech synthesis to demonstrate the
integration of multiple modalities in a collaborative application. Finally, a
discussion over tool limitations and performance is provided to understand its
role in multimodal robotic systems. All developments are available at:
https://github.com/NMKsas/gesture_pointer.git.
comment: Accepted by the 2025 34th IEEE International Conference on Robot and
Human Interactive Communication (RO-MAN). Preprint
☆ An Introduction to Zero-Order Optimization Techniques for Robotics
Zero-order optimization techniques are becoming increasingly popular in
robotics due to their ability to handle non-differentiable functions and escape
local minima. These advantages make them particularly useful for trajectory
optimization and policy optimization. In this work, we propose a mathematical
tutorial on random search. It offers a simple and unifying perspective for
understanding a wide range of algorithms commonly used in robotics. Leveraging
this viewpoint, we classify many trajectory optimization methods under a common
framework and derive novel competitive RL algorithms.
☆ Multi-Robot Assembly of Deformable Linear Objects Using Multi-Modal Perception
Kejia Chen, Celina Dettmering, Florian Pachler, Zhuo Liu, Yue Zhang, Tailai Cheng, Jonas Dirr, Zhenshan Bing, Alois Knoll, Rüdiger Daub
Industrial assembly of deformable linear objects (DLOs) such as cables offers
great potential for many industries. However, DLOs pose several challenges for
robot-based automation due to the inherent complexity of deformation and,
consequentially, the difficulties in anticipating the behavior of DLOs in
dynamic situations. Although existing studies have addressed isolated
subproblems like shape tracking, grasping, and shape control, there has been
limited exploration of integrated workflows that combine these individual
processes. To address this gap, we propose an object-centric perception and
planning framework to achieve a comprehensive DLO assembly process throughout
the industrial value chain. The framework utilizes visual and tactile
information to track the DLO's shape as well as contact state across different
stages, which facilitates effective planning of robot actions. Our approach
encompasses robot-based bin picking of DLOs from cluttered environments,
followed by a coordinated handover to two additional robots that mount the DLOs
onto designated fixtures. Real-world experiments employing a setup with
multiple robots demonstrate the effectiveness of the approach and its relevance
to industrial scenarios.
☆ LMPVC and Policy Bank: Adaptive voice control for industrial robots with code generating LLMs and reusable Pythonic policies
Modern industry is increasingly moving away from mass manufacturing, towards
more specialized and personalized products. As manufacturing tasks become more
complex, full automation is not always an option, human involvement may be
required. This has increased the need for advanced human robot collaboration
(HRC), and with it, improved methods for interaction, such as voice control.
Recent advances in natural language processing, driven by artificial
intelligence (AI), have the potential to answer this demand. Large language
models (LLMs) have rapidly developed very impressive general reasoning
capabilities, and many methods of applying this to robotics have been proposed,
including through the use of code generation. This paper presents Language
Model Program Voice Control (LMPVC), an LLM-based prototype voice control
architecture with integrated policy programming and teaching capabilities,
built for use with Robot Operating System 2 (ROS2) compatible robots. The
architecture builds on prior works using code generation for voice control by
implementing an additional programming and teaching system, the Policy Bank. We
find this system can compensate for the limitations of the underlying LLM, and
allow LMPVC to adapt to different downstream tasks without a slow and costly
training process. The architecture and additional results are released on
GitHub (https://github.com/ozzyuni/LMPVC).
comment: Accepted by the 2025 34th IEEE International Conference on Robot and
Human Interactive Communication (RO-MAN). For further information, videos and
code, see https://github.com/ozzyuni/LMPVC
☆ A MILP-Based Solution to Multi-Agent Motion Planning and Collision Avoidance in Constrained Environments
We propose a mixed-integer linear program (MILP) for multi-agent motion
planning that embeds Polytopic Action-based Motion Planning (PAAMP) into a
sequence-then-solve pipeline. Region sequences confine each agent to adjacent
convex polytopes, while a big-M hyperplane model enforces inter-agent
separation. Collision constraints are applied only to agents sharing or
neighboring a region, which reduces binary variables exponentially compared
with naive formulations. An L1 path-length-plus-acceleration cost yields smooth
trajectories. We prove finite-time convergence and demonstrate on
representative multi-agent scenarios with obstacles that our formulation
produces collision-free trajectories an order of magnitude faster than an
unstructured MILP baseline.
comment: Accepted to 2025 IEEE International Conference on Automation Science
and Engineering (CASE 2025)
☆ SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model CVPR 2025
Shuhan Tan, John Lambert, Hong Jeon, Sakshum Kulshrestha, Yijing Bai, Jing Luo, Dragomir Anguelov, Mingxing Tan, Chiyu Max Jiang
The goal of traffic simulation is to augment a potentially limited amount of
manually-driven miles that is available for testing and validation, with a much
larger amount of simulated synthetic miles. The culmination of this vision
would be a generative simulated city, where given a map of the city and an
autonomous vehicle (AV) software stack, the simulator can seamlessly simulate
the trip from point A to point B by populating the city around the AV and
controlling all aspects of the scene, from animating the dynamic agents (e.g.,
vehicles, pedestrians) to controlling the traffic light states. We refer to
this vision as CitySim, which requires an agglomeration of simulation
technologies: scene generation to populate the initial scene, agent behavior
modeling to animate the scene, occlusion reasoning, dynamic scene generation to
seamlessly spawn and remove agents, and environment simulation for factors such
as traffic lights. While some key technologies have been separately studied in
various works, others such as dynamic scene generation and environment
simulation have received less attention in the research community. We propose
SceneDiffuser++, the first end-to-end generative world model trained on a
single loss function capable of point A-to-B simulation on a city scale
integrating all the requirements above. We demonstrate the city-scale traffic
simulation capability of SceneDiffuser++ and study its superior realism under
long simulation conditions. We evaluate the simulation quality on an augmented
version of the Waymo Open Motion Dataset (WOMD) with larger map regions to
support trip-level simulation.
comment: Accepted to CVPR 2025
☆ Integrating Multi-Modal Sensors: A Review of Fusion Techniques for Intelligent Vehicles
Multi-sensor fusion plays a critical role in enhancing perception for
autonomous driving, overcoming individual sensor limitations, and enabling
comprehensive environmental understanding. This paper first formalizes
multi-sensor fusion strategies into data-level, feature-level, and
decision-level categories and then provides a systematic review of deep
learning-based methods corresponding to each strategy. We present key
multi-modal datasets and discuss their applicability in addressing real-world
challenges, particularly in adverse weather conditions and complex urban
environments. Additionally, we explore emerging trends, including the
integration of Vision-Language Models (VLMs), Large Language Models (LLMs), and
the role of sensor fusion in end-to-end autonomous driving, highlighting its
potential to enhance system adaptability and robustness. Our work offers
valuable insights into current methods and future directions for multi-sensor
fusion in autonomous driving.
comment: Accepted by IEEE IV 2025
☆ Embodied Domain Adaptation for Object Detection IROS 2025
Mobile robots rely on object detectors for perception and object localization
in indoor environments. However, standard closed-set methods struggle to handle
the diverse objects and dynamic conditions encountered in real homes and labs.
Open-vocabulary object detection (OVOD), driven by Vision Language Models
(VLMs), extends beyond fixed labels but still struggles with domain shifts in
indoor environments. We introduce a Source-Free Domain Adaptation (SFDA)
approach that adapts a pre-trained model without accessing source data. We
refine pseudo labels via temporal clustering, employ multi-scale threshold
fusion, and apply a Mean Teacher framework with contrastive learning. Our
Embodied Domain Adaptation for Object Detection (EDAOD) benchmark evaluates
adaptation under sequential changes in lighting, layout, and object diversity.
Our experiments show significant gains in zero-shot detection performance and
flexible adaptation to dynamic indoor conditions.
comment: Accepted by IROS 2025
☆ Skill-Nav: Enhanced Navigation with Versatile Quadrupedal Locomotion via Waypoint Interface
Quadrupedal robots have demonstrated exceptional locomotion capabilities
through Reinforcement Learning (RL), including extreme parkour maneuvers.
However, integrating locomotion skills with navigation in quadrupedal robots
has not been fully investigated, which holds promise for enhancing
long-distance movement capabilities. In this paper, we propose Skill-Nav, a
method that incorporates quadrupedal locomotion skills into a hierarchical
navigation framework using waypoints as an interface. Specifically, we train a
waypoint-guided locomotion policy using deep RL, enabling the robot to
autonomously adjust its locomotion skills to reach targeted positions while
avoiding obstacles. Compared with direct velocity commands, waypoints offer a
simpler yet more flexible interface for high-level planning and low-level
control. Utilizing waypoints as the interface allows for the application of
various general planning tools, such as large language models (LLMs) and path
planning algorithms, to guide our locomotion policy in traversing terrains with
diverse obstacles. Extensive experiments conducted in both simulated and
real-world scenarios demonstrate that Skill-Nav can effectively traverse
complex terrains and complete challenging navigation tasks.
comment: 17pages, 6 figures
♻ ☆ eCAV: An Edge-Assisted Evaluation Platform for Connected Autonomous Vehicles
Tyler Landle, Jordan Rapp, Dean Blank, Chandramouli Amarnath, Abhijit Chatterjee, Alexandros Daglis, Umakishore Ramachandran
As autonomous vehicles edge closer to widespread adoption, enhancing road
safety through collision avoidance and minimization of collateral damage
becomes imperative. Vehicle-to-everything (V2X) technologies, which include
vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), and vehicle-to-cloud
(V2C), are being proposed as mechanisms to achieve this safety improvement.
Simulation-based testing is crucial for early-stage evaluation of Connected
Autonomous Vehicle (CAV) control systems, offering a safer and more
cost-effective alternative to real-world tests. However, simulating large 3D
environments with many complex single- and multi-vehicle sensors and
controllers is computationally intensive. There is currently no evaluation
framework that can effectively evaluate realistic scenarios involving large
numbers of autonomous vehicles.
We propose eCAV -- an efficient, modular, and scalable evaluation platform to
facilitate both functional validation of algorithmic approaches to increasing
road safety, as well as performance prediction of algorithms of various V2X
technologies, including a futuristic Vehicle-to-Edge control plane and
correspondingly designed control algorithms. eCAV can model up to 256 vehicles
running individual control algorithms without perception enabled, which is
$8\times$ more vehicles than what is possible with state-of-the-art
alternatives.
♻ ☆ FEAST: A Flexible Mealtime-Assistance System Towards In-the-Wild Personalization
Rajat Kumar Jenamani, Tom Silver, Ben Dodson, Shiqin Tong, Anthony Song, Yuting Yang, Ziang Liu, Benjamin Howe, Aimee Whitneck, Tapomayukh Bhattacharjee
Physical caregiving robots hold promise for improving the quality of life of
millions worldwide who require assistance with feeding. However, in-home meal
assistance remains challenging due to the diversity of activities (e.g.,
eating, drinking, mouth wiping), contexts (e.g., socializing, watching TV),
food items, and user preferences that arise during deployment. In this work, we
propose FEAST, a flexible mealtime-assistance system that can be personalized
in-the-wild to meet the unique needs of individual care recipients. Developed
in collaboration with two community researchers and informed by a formative
study with a diverse group of care recipients, our system is guided by three
key tenets for in-the-wild personalization: adaptability, transparency, and
safety. FEAST embodies these principles through: (i) modular hardware that
enables switching between assisted feeding, drinking, and mouth-wiping, (ii)
diverse interaction methods, including a web interface, head gestures, and
physical buttons, to accommodate diverse functional abilities and preferences,
and (iii) parameterized behavior trees that can be safely and transparently
adapted using a large language model. We evaluate our system based on the
personalization requirements identified in our formative study, demonstrating
that FEAST offers a wide range of transparent and safe adaptations and
outperforms a state-of-the-art baseline limited to fixed customizations. To
demonstrate real-world applicability, we conduct an in-home user study with two
care recipients (who are community researchers), feeding them three meals each
across three diverse scenarios. We further assess FEAST's ecological validity
by evaluating with an Occupational Therapist previously unfamiliar with the
system. In all cases, users successfully personalize FEAST to meet their
individual needs and preferences. Website: https://emprise.cs.cornell.edu/feast
comment: RSS 2025 - Best Paper Award
♻ ☆ AirLine: Efficient Learnable Line Detection with Local Edge Voting
Line detection is widely used in many robotic tasks such as scene
recognition, 3D reconstruction, and simultaneous localization and mapping
(SLAM). Compared to points, lines can provide both low-level and high-level
geometrical information for downstream tasks. In this paper, we propose a novel
learnable edge-based line detection algorithm, AirLine, which can be applied to
various tasks. In contrast to existing learnable endpoint-based methods, which
are sensitive to the geometrical condition of environments, AirLine can extract
line segments directly from edges, resulting in a better generalization ability
for unseen environments. To balance efficiency and accuracy, we introduce a
region-grow algorithm and a local edge voting scheme for line parameterization.
To the best of our knowledge, AirLine is one of the first learnable edge-based
line detection methods. Our extensive experiments have shown that it retains
state-of-the-art-level precision, yet with a 3 to 80 times runtime acceleration
compared to other learning-based methods, which is critical for low-power
robots.
♻ ☆ UAV-based path planning for efficient localization of non-uniformly distributed weeds using prior knowledge: A reinforcement-learning approach
UAVs are becoming popular in agriculture, however, they usually use
time-consuming row-by-row flight paths. This paper presents a
deep-reinforcement-learning-based approach for path planning to efficiently
localize weeds in agricultural fields using UAVs with minimal flight-path
length. The method combines prior knowledge about the field containing
uncertain, low-resolution weed locations with in-flight weed detections. The
search policy was learned using deep Q-learning. We trained the agent in
simulation, allowing a thorough evaluation of the weed distribution, typical
errors in the perception system, prior knowledge, and different stopping
criteria on the planner's performance. When weeds were non-uniformly
distributed over the field, the agent found them faster than a row-by-row path,
showing its capability to learn and exploit the weed distribution. Detection
errors and prior knowledge quality had a minor effect on the performance,
indicating that the learned search policy was robust to detection errors and
did not need detailed prior knowledge. The agent also learned to terminate the
search. To test the transferability of the learned policy to a real-world
scenario, the planner was tested on real-world image data without further
training, which showed a 66% shorter path compared to a row-by-row path at the
cost of a 10% lower percentage of found weeds. Strengths and weaknesses of the
planner for practical application are comprehensively discussed, and directions
for further development are provided. Overall, it is concluded that the learned
search policy can improve the efficiency of finding non-uniformly distributed
weeds using a UAV and shows potential for use in agricultural practice.
♻ ☆ RESPLE: Recursive Spline Estimation for LiDAR-Based Odometry
We present a novel recursive Bayesian estimation framework using B-splines
for continuous-time 6-DoF dynamic motion estimation. The state vector consists
of a recurrent set of position control points and orientation control point
increments, enabling efficient estimation via a modified iterated extended
Kalman filter without involving error-state formulations. The resulting
recursive spline estimator (RESPLE) is further leveraged to develop a versatile
suite of direct LiDAR-based odometry solutions, supporting the integration of
one or multiple LiDARs and an IMU. We conduct extensive real-world evaluations
using public datasets and our own experiments, covering diverse sensor setups,
platforms, and environments. Compared to existing systems, RESPLE achieves
comparable or superior estimation accuracy and robustness, while attaining
real-time efficiency. Our results and analysis demonstrate RESPLE's strength in
handling highly dynamic motions and complex scenes within a lightweight and
flexible design, showing strong potential as a universal framework for
multi-sensor motion estimation. We release the source code and experimental
datasets at https://github.com/ASIG-X/RESPLE.
♻ ☆ Efficient Reconfiguration of Tile Arrangements by a Single Active Robot
Aaron T. Becker, Sándor P. Fekete, Jonas Friemel, Ramin Kosfeld, Peter Kramer, Harm Kube, Christian Rieck, Christian Scheffer, Arne Schmidt
We consider the problem of reconfiguring a two-dimensional connected grid
arrangement of passive building blocks from a start configuration to a goal
configuration, using a single active robot that can move on the tiles, remove
individual tiles from a given location and physically move them to a new
position by walking on the remaining configuration. The objective is to
determine a schedule that minimizes the overall makespan, while keeping the
tile configuration connected.
We provide both negative and positive results. (1) We generalize the problem
by introducing weighted movement costs, which can vary depending on whether
tiles are carried or not, and prove that this variant is NP-hard. (2) We give a
polynomial-time constant-factor approximation algorithm for the case of
disjoint start and target bounding boxes, which additionally yields optimal
carry distance for 2-scaled instances.
comment: 19 pages, 15 figures, to appear in the proceedings of the 37th
Canadian Conference on Computational Geometry (CCCG 2025)
♻ ☆ TrajFlow: Learning Distributions over Trajectories for Human Behavior Prediction
Predicting the future behavior of human road users is an important aspect for
the development of risk-aware autonomous vehicles. While many models have been
developed towards this end, effectively capturing and predicting the
variability inherent to human behavior still remains an open challenge. This
paper proposes TrajFlow - a new approach for probabilistic trajectory
prediction based on Normalizing Flows. We reformulate the problem of capturing
distributions over trajectories into capturing distributions over abstracted
trajectory features using an autoencoder, simplifying the learning task of the
Normalizing Flows. TrajFlow outperforms state-of-the-art behavior prediction
models in capturing full trajectory distributions in two synthetic benchmarks
with known true distributions, and is competitive on the naturalistic datasets
ETH/UCY, rounD, and nuScenes. Our results demonstrate the effectiveness of
TrajFlow in probabilistic prediction of human behavior.
♻ ☆ Hierarchical Intention-Aware Expressive Motion Generation for Humanoid Robots
Effective human-robot interaction requires robots to identify human
intentions and generate expressive, socially appropriate motions in real-time.
Existing approaches often rely on fixed motion libraries or computationally
expensive generative models. We propose a hierarchical framework that combines
intention-aware reasoning via in-context learning (ICL) with real-time motion
generation using diffusion models. Our system introduces structured prompting
with confidence scoring, fallback behaviors, and social context awareness to
enable intention refinement and adaptive response. Leveraging large-scale
motion datasets and efficient latent-space denoising, the framework generates
diverse, physically plausible gestures suitable for dynamic humanoid
interactions. Experimental validation on a physical platform demonstrates the
robustness and social alignment of our method in realistic scenarios.
comment: 7 pages, 2 figures, IEEE conference paper
♻ ☆ Mitigating Metropolitan Carbon Emissions with Dynamic Eco-driving at Scale
Vindula Jayawardana, Baptiste Freydt, Ao Qu, Cameron Hickert, Edgar Sanchez, Catherine Tang, Mark Taylor, Blaine Leonard, Cathy Wu
The sheer scale and diversity of transportation make it a formidable sector
to decarbonize. Here, we consider an emerging opportunity to reduce carbon
emissions: the growing adoption of semi-autonomous vehicles, which can be
programmed to mitigate stop-and-go traffic through intelligent speed commands
and, thus, reduce emissions. But would such dynamic eco-driving move the needle
on climate change? A comprehensive impact analysis has been out of reach due to
the vast array of traffic scenarios and the complexity of vehicle emissions. We
address this challenge with large-scale scenario modeling efforts and by using
multi-task deep reinforcement learning with a carefully designed network
decomposition strategy. We perform an in-depth prospective impact assessment of
dynamic eco-driving at 6,011 signalized intersections across three major US
metropolitan cities, simulating a million traffic scenarios. Overall, we find
that vehicle trajectories optimized for emissions can cut city-wide
intersection carbon emissions by 11-22%, without harming throughput or safety,
and with reasonable assumptions, equivalent to the national emissions of Israel
and Nigeria, respectively. We find that 10% eco-driving adoption yields 25%-50%
of the total reduction, and nearly 70% of the benefits come from 20% of
intersections, suggesting near-term implementation pathways. However, the
composition of this high-impact subset of intersections varies considerably
across different adoption levels, with minimal overlap, calling for careful
strategic planning for eco-driving deployments. Moreover, the impact of
eco-driving, when considered jointly with projections of vehicle
electrification and hybrid vehicle adoption remains significant. More broadly,
this work paves the way for large-scale analysis of traffic externalities, such
as time, safety, and air quality, and the potential impact of solution
strategies.
comment: Accepted for publication at Transportation Research Part C: Emerging
Technologies
♻ ☆ TritonZ: A Remotely Operated Underwater Rover with Manipulator Arm for Exploration and Rescue Operations
The increasing demand for underwater exploration and rescue operations
enforces the development of advanced wireless or semi-wireless underwater
vessels equipped with manipulator arms. This paper presents the implementation
of a semi-wireless underwater vehicle, "TritonZ" equipped with a manipulator
arm, tailored for effective underwater exploration and rescue operations. The
vehicle's compact design enables deployment in different submarine
surroundings, addressing the need for wireless systems capable of navigating
challenging underwater terrains. The manipulator arm can interact with the
environment, allowing the robot to perform sophisticated tasks during
exploration and rescue missions in emergency situations. TritonZ is equipped
with various sensors such as Pi-Camera, Humidity, and Temperature sensors to
send real-time environmental data. Our underwater vehicle controlled using a
customized remote controller can navigate efficiently in the water where
Pi-Camera enables live streaming of the surroundings. Motion control and video
capture are performed simultaneously using this camera. The manipulator arm is
designed to perform various tasks, similar to grasping, manipulating, and
collecting underwater objects. Experimental results shows the efficacy of the
proposed remotely operated vehicle in performing a variety of underwater
exploration and rescue tasks. Additionally, the results show that TritonZ can
maintain an average of 13.5cm/s with a minimal delay of 2-3 seconds.
Furthermore, the vehicle can sustain waves underwater by maintaining its
position as well as average velocity. The full project details and source code
can be accessed at this link: https://github.com/kawser-ahmed-byte/TritonZ
comment: 7 pages, 5 figures
♻ ☆ Cooperative Bearing-Only Target Pursuit via Multiagent Reinforcement Learning: Design and Experiment IROS 2025
This paper addresses the multi-robot pursuit problem for an unknown target,
encompassing both target state estimation and pursuit control. First, in state
estimation, we focus on using only bearing information, as it is readily
available from vision sensors and effective for small, distant targets.
Challenges such as instability due to the nonlinearity of bearing measurements
and singularities in the two-angle representation are addressed through a
proposed uniform bearing-only information filter. This filter integrates
multiple 3D bearing measurements, provides a concise formulation, and enhances
stability and resilience to target loss caused by limited field of view (FoV).
Second, in target pursuit control within complex environments, where challenges
such as heterogeneity and limited FoV arise, conventional methods like
differential games or Voronoi partitioning often prove inadequate. To address
these limitations, we propose a novel multiagent reinforcement learning (MARL)
framework, enabling multiple heterogeneous vehicles to search, localize, and
follow a target while effectively handling those challenges. Third, to bridge
the sim-to-real gap, we propose two key techniques: incorporating adjustable
low-level control gains in training to replicate the dynamics of real-world
autonomous ground vehicles (AGVs), and proposing spectral-normalized RL
algorithms to enhance policy smoothness and robustness. Finally, we demonstrate
the successful zero-shot transfer of the MARL controllers to AGVs, validating
the effectiveness and practical feasibility of our approach. The accompanying
video is available at https://youtu.be/HO7FJyZiJ3E.
comment: To appear in the 2025 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2025)
♻ ☆ Estimating Spatially-Dependent GPS Errors Using a Swarm of Robots
Praneeth Somisetty, Robert Griffin, Victor M. Baez, Miguel F. Arevalo-Castiblanco, Aaron T. Becker, Jason M. O'Kane
External factors, including urban canyons and adversarial interference, can
lead to Global Positioning System (GPS) inaccuracies that vary as a function of
the position in the environment. This study addresses the challenge of
estimating a static, spatially-varying error function using a team of robots.
We introduce a State Bias Estimation Algorithm (SBE) whose purpose is to
estimate the GPS biases. The central idea is to use sensed estimates of the
range and bearing to the other robots in the team to estimate changes in bias
across the environment. A set of drones moves in a 2D environment, each
sampling data from GPS, range, and bearing sensors. The biases calculated by
the SBE at estimated positions are used to train a Gaussian Process Regression
(GPR) model. We use a Sparse Gaussian process-based Informative Path Planning
(IPP) algorithm that identifies high-value regions of the environment for data
collection. The swarm plans paths that maximize information gain in each
iteration, further refining their understanding of the environment's positional
bias landscape. We evaluated SBE and IPP in simulation and compared the IPP
methodology to an open-loop strategy.
comment: 6 pages, 7 figures, 2025 IEEE 21st International Conference on
Automation Science and Engineering
♻ ☆ Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Shuo Wang, Yongcai Wang, Wanting Li, Xudong Cai, Yucheng Wang, Maiyue Chen, Kaihui Wang, Zhizhong Su, Deying Li, Zhaoxin Fan
Vision-Language Navigation (VLN) is a critical task for developing embodied
agents that can follow natural language instructions to navigate in complex
real-world environments. Recent advances in VLN by large pretrained models have
significantly improved generalization and instruction grounding compared to
traditional approaches. However, the role of reasoning strategies in
navigation-an action-centric, long-horizon task-remains underexplored, despite
Chain-of-Thought (CoT) reasoning's demonstrated success in static tasks like
visual question answering. To address this gap, we conduct the first systematic
evaluation of reasoning strategies for VLN, including No-Think (direct action
prediction), Pre-Think (reason before action), and Post-Think (reason after
action). Surprisingly, our findings reveal the Inference-time Reasoning
Collapse issue, where inference-time reasoning degrades navigation accuracy,
highlighting the challenges of integrating reasoning into VLN. Based on this
insight, we propose Aux-Think, a framework that trains models to internalize
structured reasoning patterns through CoT supervision, while inferring action
directly without reasoning in online prediction. To support this framework, we
release R2R-CoT-320k, the first Chain-of-Thought annotated dataset for VLN.
Extensive experiments show that Aux-Think reduces training effort greatly and
achieves the best performance under the same data scale.
♻ ☆ Haptic-ACT -- Pseudo Oocyte Manipulation by a Robot Using Multimodal Information and Action Chunking with Transformers IROS2025
In this paper we introduce Haptic-ACT, an advanced robotic system for pseudo
oocyte manipulation, integrating multimodal information and Action Chunking
with Transformers (ACT). Traditional automation methods for oocyte transfer
rely heavily on visual perception, often requiring human supervision due to
biological variability and environmental disturbances. Haptic-ACT enhances ACT
by incorporating haptic feedback, enabling real-time grasp failure detection
and adaptive correction. Additionally, we introduce a 3D-printed TPU soft
gripper to facilitate delicate manipulations. Experimental results demonstrate
that Haptic-ACT improves the task success rate, robustness, and adaptability
compared to conventional ACT, particularly in dynamic environments. These
findings highlight the potential of multimodal learning in robotics for
biomedical automation.
comment: Accepted at IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS2025) Project website
https://tanichu-laboratory.github.io/pedro_haptic_act_iros2025/
♻ ☆ QT-DoG: Quantization-aware Training for Domain Generalization ICML
A key challenge in Domain Generalization (DG) is preventing overfitting to
source domains, which can be mitigated by finding flatter minima in the loss
landscape. In this work, we propose Quantization-aware Training for Domain
Generalization (QT-DoG) and demonstrate that weight quantization effectively
leads to flatter minima in the loss landscape, thereby enhancing domain
generalization. Unlike traditional quantization methods focused on model
compression, QT-DoG exploits quantization as an implicit regularizer by
inducing noise in model weights, guiding the optimization process toward
flatter minima that are less sensitive to perturbations and overfitting. We
provide both an analytical perspective and empirical evidence demonstrating
that quantization inherently encourages flatter minima, leading to better
generalization across domains. Moreover, with the benefit of reducing the model
size through quantization, we demonstrate that an ensemble of multiple
quantized models further yields superior accuracy than the state-of-the-art DG
approaches with no computational or memory overheads. Code is released at:
https://saqibjaved1.github.io/QT_DoG/.
comment: Accepted at International Conference on Machine Learning (ICML) 2025.
Project website: https://saqibjaved1.github.io/QT_DoG/