Robotics 32
☆ FineCog-Nav: Integrating Fine-grained Cognitive Modules for Zero-shot Multimodal UAV Navigation CVPR 2026
UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous multi-step instructions over long horizons. Existing zero-shot methods remain limited, as they often rely on large base models, generic prompts, and loosely coordinated modules. In this work, we propose FineCog-Nav, a top-down framework inspired by human cognition that organizes navigation into fine-grained modules for language processing, perception, attention, memory, imagination, reasoning, and decision-making. Each module is driven by a moderate-sized foundation model with role-specific prompts and structured input-output protocols, enabling effective collaboration and improved interpretability. To support fine-grained evaluation, we construct AerialVLN-Fine, a curated benchmark of 300 trajectories derived from AerialVLN, with sentence-level instruction-trajectory alignment and refined instructions containing explicit visual endpoints and landmark references. Experiments show that FineCog-Nav consistently outperforms zero-shot baselines in instruction adherence, long-horizon planning, and generalization to unseen environments. These results suggest the effectiveness of fine-grained cognitive modularization for zero-shot aerial navigation. Project page: https://smartdianlab.github.io/projects-FineCogNav.
comment: Accepted by CVPR 2026 Findings
☆ Semantic Area Graph Reasoning for Multi-Robot Language-Guided Search
Coordinating multi-robot systems (MRS) to search in unknown environments is particularly challenging for tasks that require semantic reasoning beyond geometric exploration. Classical coordination strategies rely on frontier coverage or information gain and cannot incorporate high-level task intent, such as searching for objects associated with specific room types. We propose \textit{Semantic Area Graph Reasoning} (SAGR), a hierarchical framework that enables Large Language Models (LLMs) to coordinate multi-robot exploration and semantic search through a structured semantic-topological abstraction of the environment. SAGR incrementally constructs a semantic area graph from a semantic occupancy map, encoding room instances, connectivity, frontier availability, and robot states into a compact task-relevant representation for LLM reasoning. The LLM performs high-level semantic room assignment based on spatial structure and task context, while deterministic frontier planning and local navigation handle geometric execution within assigned rooms. Experiments on the Habitat-Matterport3D dataset across 100 scenarios show that SAGR remains competitive with state-of-the-art exploration methods while consistently improving semantic target search efficiency, with up to 18.8\% in large environments. These results highlight the value of structured semantic abstractions as an effective interface between LLM-based reasoning and multi-robot coordination in complex indoor environments.
☆ DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs
Consumer LiDARs in mobile devices and robots typically output a single depth value per pixel. Yet internally, they record full time-resolved histograms containing direct and multi-bounce light returns; these multi-bounce returns encode rich non-line-of-sight (NLOS) cues that can enable perception of hidden objects in a scene. However, severe hardware limitations of consumer LiDARs make NLOS reconstruction with conventional methods difficult. In this work, we motivate a complementary direction: enabling NLOS perception with low-cost LiDARs through data-driven inference. We present DENALI, the first large-scale real-world dataset of space-time histograms from low-cost LiDARs capturing hidden objects. We capture time-resolved LiDAR histograms for 72,000 hidden-object scenes across diverse object shapes, positions, lighting conditions, and spatial resolutions. Using our dataset, we show that consumer LiDARs can enable accurate, data-driven NLOS perception. We further identify key scene and modeling factors that limit performance, as well as simulation-fidelity gaps that hinder current sim-to-real transfer, motivating future work toward scalable NLOS vision with consumer LiDARs.
☆ SENSE: Stereo OpEN Vocabulary SEmantic Segmentation
Open-vocabulary semantic segmentation enables models to segment objects or image regions beyond fixed class sets, offering flexibility in dynamic environments. However, existing methods often rely on single-view images and struggle with spatial precision, especially under occlusions and near object boundaries. We propose SENSE, the first work on Stereo OpEN Vocabulary SEmantic Segmentation, which leverages stereo vision and vision-language models to enhance open-vocabulary semantic segmentation. By incorporating stereo image pairs, we introduce geometric cues that improve spatial reasoning and segmentation accuracy. Trained on the PhraseStereo dataset, our approach achieves strong performance in phrase-grounded tasks and demonstrates generalization in zero-shot settings. On PhraseStereo, we show a +2.9% improvement in Average Precision over the baseline method and +0.76% over the best competing method. SENSE also provides a relative improvement of +3.5% mIoU on Cityscapes and +18% on KITTI compared to the baseline work. By jointly reasoning over semantics and geometry, SENSE supports accurate scene understanding from natural language, essential for autonomous robots and Intelligent Transportation Systems.
☆ VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation
Diffusion policies are becoming mainstream in robotic manipulation but suffer from hard negative class imbalance due to uniform sampling and lack of sample difficulty awareness, leading to slow training convergence and frequent inference timeout failures. We propose VADF (Vision-Adaptive Diffusion Policy Framework), a vision-driven dual-adaptive framework that significantly reduces convergence steps and achieves early success in inference, with model-agnostic design enabling seamless integration into any diffusion policy architecture. During training, we introduce Adaptive Loss Network (ALN), a lightweight MLP-based loss predictor that quantifies per-step sample difficulty in real time. Guided by hard negative mining, it performs weighted sampling to prioritize high-loss regions, enabling adaptive weight updates and faster convergence. In inference, we design the Hierarchical Vision Task Segmenter (HVTS), which decomposes high-level task instructions into multi-stage low-level sub-instructions based on visual input. It adaptively segments action sequences into simple and complex subtasks by assigning shorter noise schedules with longer direct execution sequences to simple actions, and longer noise steps with shorter execution sequences to complex ones, thereby dramatically reducing computational overhead and significantly improving the early success rate.
☆ A Reconfigurable Pneumatic Joint Enabling Localized Selective Stiffening and Shape Locking in Vine-Inspired Robots
Vine-inspired robots achieve large workspace coverage through tip eversion, enabling safe navigation in confined and cluttered environments. However, their deployment in free space is fundamentally limited by low axial stiffness, poor load-bearing capacity, and the inability to retain shape during and after steering. In this work, we propose a reconfigurable pneumatic joint (RPJ) architecture that introduces discrete, pressure-tunable stiffness along the robot body without compromising continuous growth. Each RPJ module comprises symmetrically distributed pneumatic chambers that locally increase bending stiffness when pressurized, enabling decoupling between global compliance and localized rigidity. We integrate the RPJs into a soft growing robot with tendon-driven steering and develop a compact base station for mid-air eversion. System characterization and experimental validation demonstrate moderate pressure requirements for eversion, as well as comparable localized stiffening and steering performance to layer-jamming mechanisms. Demonstrations further show that the proposed robot achieves improved shape retention during bending, reduced gravitational deflection under load, cascading retraction, and reliable payload transport up to 202 g in free space. The RPJ mechanism establishes a practical pathway toward structurally adaptive vine robots for manipulation-oriented tasks such as object sorting and adaptive exploration in unconstrained environments.
comment: Original Article
☆ Robust Fleet Sizing for Multi-UAV Inspection Missions under Synchronized Replacement Demand
Multi-UAV inspection missions require spare drones to replace active drones during recharging cycles. Existing fleet-sizing approaches often assume steady-state operating conditions that do not apply to finite-horizon missions, or they treat replacement requests as statistically independent events. The latter provides per-request blocking guarantees that fail to translate to mission-level reliability when demands cluster. This paper identifies a structural failure mode where efficient routing assigns similar workloads to each UAV, leading to synchronized battery depletion and replacement bursts that exhaust the spare pool even when average capacity is sufficient.
We derive a closed-form sufficient fleet-sizing rule, k = m(ceil(R) + 1), where m is the number of active UAVs and R is the recovery-to-active time ratio. This additive buffer of m spares absorbs worst-case synchronized demand at recovery-cycle boundaries and ensures mission-level reliability even when all UAVs deplete simultaneously. Monte Carlo validation across five scenarios (m in [2, 10], R in [0.87, 3.39], 1000 trials each) shows that Erlang-B sizing with a per-request blocking target epsilon = 0.01 drops to 69.9% mission success at R = 3.39, with 95% of spare exhaustion events concentrated in the top-decile 5-minute demand windows. In contrast, the proposed rule maintains 99.8% success (Wilson 95% lower bound 99.3%) across all tested conditions, including wind variability up to CV = 0.30, while requiring only four additional drones in the most demanding scenario.
☆ DTEA: A Dual-Topology Elastic Actuator Enabling Real-Time Switching Between Series and Parallel Compliance
Series and parallel elastic actuators offer complementary but mutually exclusive advantages, yet no existing actuator enables real-time transition between these topologies during operation. This paper presents a novel actuator design called the Dual-Topology Elastic Actuator (DTEA), which enables dynamic switching between SEA and PEA topologies during operation. A proof-of-concept prototype of the DTEA is developed to demonstrate the feasibility of the topology-switching mechanism. Experiments are conducted to evaluate the robustness and timing of the switching mechanism under operational conditions. The actuator successfully performed 324 topology-switching cycles under load without damage, demonstrating the robustness of the mechanism. The measured switching time between SEA and PEA modes is under 33.33 ms. Additional experiments are conducted to characterize the static stiffness and disturbance rejection performance in both SEA and PEA modes. Static stiffness tests show that the PEA mode is 1.53x stiffer than the SEA mode, with KSEA = 5.57 +/- 0.02 Nm/rad and KPEA = 8.54 +/- 0.02 Nm/rad. Disturbance rejection experiments show that the mean peak deflection in SEA mode is 2.26x larger than in PEA mode (5.2 deg vs. 2.3 deg), while the mean settling time is 3.45x longer (1380 ms vs. 400 ms). The observed behaviors are consistent with the known characteristics of conventional SEA and PEA actuators, validating the functionality of both modes in the DTEA actuator.
☆ Environment-Adaptive Solid-State LiDAR-Inertial Odometry
Solid-state LiDAR-inertial SLAM has attracted significant attention due to its advantages in speed and robustness. However, achieving accurate mapping in extreme environments remains challenging due to severe geometric degeneracy and unreliable observations, which often lead to ill-conditioned optimization and map inconsistencies. To address these challenges, we propose an environment-adaptive solid-state LiDAR-inertial odometry that integrates local normal-vector constraints with degeneracy-aware map maintenance to enhance localization accuracy. Specifically, we introduce local normal-vector constraints to improve the stability of state estimation, effectively suppressing localization drift in degenerate scenarios. Furthermore, we design a degeneration-guided map update strategy to improve map precision. Benefiting from the refined map representation, localization accuracy is further enhanced in subsequent estimation. Experimental results demonstrate that the proposed method achieves superior mapping accuracy and robustness in extreme and perceptually degraded environments, with an average RMSE reduction of up to 12.8% compared to the baseline method.
☆ Limits of Lamarckian Evolution Under Pressure of Morphological Novelty
Lamarckian inheritance has been shown to be a powerful accelerator in systems where the joint evolution of robot morphologies and controllers is enhanced with individual learning. Its defining advantage lies in the offspring inheriting controllers learned by their parents. The efficacy of this option, however, relies on morphological similarity between parent and offspring. In this study, we examine how Lamarckian inheritance performs when the search process is driven toward high morphological variance, potentially straining the requirement for parent-offspring similarity. Using a system of modular robots that can evolve and learn to solve a locomotion task, we compare Darwinian and Lamarckian evolution to determine how they respond to shifting from pure task-based selection to a multi-objective pressure that also rewards morphological novelty. Our results confirm that Lamarckian evolution outperforms Darwinian evolution when optimizing task-performance alone. However, introducing selection pressure for morphological diversity causes a substantial performance drop, which is much greater in the Lamarckian system. Further analyses show that promoting diversity reduces parent-offspring similarity, which in turn reduces the benefits of inheriting controllers learned by parents. These results reveal the limits of Lamarckian evolution by exposing a fundamental trade-off between inheritance-based exploitation and diversity-driven exploration.
comment: 8 pages, 7 figures, Submitted to WCCI 2026
☆ Continual Hand-Eye Calibration for Open-world Robotic Manipulation
Hand-eye calibration through visual localization is a critical capability for robotic manipulation in open-world environments. However, most deep learning-based calibration models suffer from catastrophic forgetting when adapting into unseen data amongst open-world scene changes, while simple rehearsal-based continual learning strategy cannot well mitigate this issue. To overcome this challenge, we propose a continual hand-eye calibration framework, enabling robots to adapt to sequentially encountered open-world manipulation scenes through spatially replay strategy and structure-preserving distillation. Specifically, a Spatial-Aware Replay Strategy (SARS) constructs a geometrically uniform replay buffer that ensures comprehensive coverage of each scene pose space, replacing redundant adjacent frames with maximally informative viewpoints. Meanwhile, a Structure-Preserving Dual Distillation (SPDD) is proposed to decompose localization knowledge into coarse scene layout and fine pose precision, and distills them separately to alleviate both types of forgetting during continual adaptation. As a new manipulation scene arrives, SARS provides geometrically representative replay samples from all prior scenes, and SPDD applies structured distillation on these samples to retain previously learned knowledge. After training on the new scene, SARS incorporates selected samples from the new scene into the replay buffer for future rehearsal, allowing the model to continuously accumulate multi-scene calibration capability. Experiments on multiple public datasets show significant anti scene forgetting performance, maintaining accuracy on past scenes while preserving adaptation to new scenes, confirming the effectiveness of the framework.
☆ From Seeing to Simulating: Generative High-Fidelity Simulation with Digital Cousins for Generalizable Robot Learning and Evaluation
Jasper Lu, Zhenhao Shen, Yuanfei Wang, Shugao Liu, Shengqiang Xu, Shawn Xie, Jingkai Xu, Feng Jiang, Jade Yang, Chen Xie, Ruihai Wu
Learning robust robot policies in real-world environments requires diverse data augmentation, yet scaling real-world data collection is costly due to the need for acquiring physical assets and reconfiguring environments. Therefore, augmenting real-world scenes into simulation has become a practical augmentation for efficient learning and evaluation. We present a generative framework that establishes a generative real-to-sim mapping from real-world panoramas to high-fidelity simulation scenes, and further synthesize diverse cousin scenes via semantic and geometric editing. Combined with high-quality physics engines and realistic assets, the generated scenes support interactive manipulation tasks. Additionally, we incorporate multi-room stitching to construct consistent large-scale environments for long-horizon navigation across complex layouts. Experiments demonstrate a strong sim-to-real correlation validating our platform's fidelity, and show that extensively scaling up data generation leads to significantly better generalization to unseen scene and object variations, demonstrating the effectiveness of Digital Cousins for generalizable robot learning and evaluation.
☆ Fuzzy Logic Theory-based Adaptive Reward Shaping for Robust Reinforcement Learning (FARS)
Reinforcement learning (RL) often struggles in real-world tasks with high-dimensional state spaces and long horizons, where sparse or fixed rewards severely slow down exploration and cause agents to get trapped in local optima. This paper presents a fuzzy logic based reward shaping method that integrates human intuition into RL reward design. By encoding expert knowledge into adaptive and interpreable terms, fuzzy rules promote stable learning and reduce sensitivity to hyperparameters. The proposed method leverages these properties to adapt reward contributions based on the agent state, enabling smoother transitions between fast motion and precise control in challenging navigation tasks. Extensive simulation results on autonomous drone racing benchmarks show stable learning behavior and consistent task performance across scenarios of increasing difficulty. The proposed method achieves faster convergence and reduced performance variability across training seeds in more challenging environments, with success rates improving by up to approximately 5 percent compared to non fuzzy reward formulations.
comment: 6 pages, 5 figures
☆ PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding
Accurate open-vocabulary 3D scene understanding requires semantic representations that are both language-aligned and spatially precise at the pixel level, while remaining scalable when lifted to 3D space. However, existing representations struggle to jointly satisfy these requirements, and densely propagating pixel-wise semantics to 3D often results in substantial redundancy, leading to inefficient storage and querying in large-scale scenes. To address these challenges, we present \emph{PLAF}, a Pixel-wise Language-Aligned Feature extraction framework that enables dense and accurate semantic alignment in 2D without sacrificing open-vocabulary expressiveness. Building upon this representation, we further design an efficient semantic storage and querying scheme that significantly reduces redundancy across both 2D and 3D domains. Experimental results show that \emph{PLAF} provides a strong semantic foundation for accurate and efficient open-vocabulary 3D scene understanding. The codes are publicly available at https://github.com/RockWenJJ/PLAF.
comment: Accepted by ICCA 2026
☆ Long-Term Memory for VLA-based Agents in Open-World Task Execution
Vision-Language-Action (VLA) models have demonstrated significant potential for embodied decision-making; however, their application in complex chemical laboratory automation remains restricted by limited long-horizon reasoning and the absence of persistent experience accumulation. Existing frameworks typically treat planning and execution as decoupled processes, often failing to consolidate successful strategies, which results in inefficient trial-and-error in multi-stage protocols. In this paper, we propose ChemBot, a dual-layer, closed-loop framework that integrates an autonomous AI agent with a progress-aware VLA model (Skill-VLA) for hierarchical task decomposition and execution. ChemBot utilizes a dual-layer memory architecture to consolidate successful trajectories into retrievable assets, while a Model Context Protocol (MCP) server facilitates efficient sub-agent and tool orchestration. To address the inherent limitations of VLA models, we further implement a future-state-based asynchronous inference mechanism to mitigate trajectory discontinuities. Extensive experiments on collaborative robots demonstrate that ChemBot achieves superior operational safety, precision, and task success rates compared to existing VLA baselines in complex, long-horizon chemical experimentation.
☆ Contact-Aware Planning and Control of Continuum Robots in Highly Constrained Environments
Continuum robots are well suited for navigating confined and fragile environments, such as vascular or endoluminal anatomy, where contact with surrounding structures is often unavoidable. While controlled contact can assist motion, unfavorable contact can degrade controllability, induce kinematic singularities, or introduce safety risks. We present a contact-aware planning approach that evaluates contact quality, penalizing hazardous interactions, while permitting benign contact. The planner produces kinematically feasible trajectories and contact-aware Jacobians which can be used for closed-loop control in hardware experiments. We validate the approach by testing the integrated system (planning, control, and mechanical design) on anatomical models from patient scans. The planner generates effective plans for three common anatomical environments, and, in all hardware trials, the continuum robot was able to reach the target while avoiding dangerous tip contact (100% success). Mean tracking errors were 1.9 +/- 0.5 mm, 1.2 +/- 0.1 mm, and 1.7 +/- 0.2 mm across the three different environments. Ablation studies showed that penalizing end-of-continuum-segment (ECS) contact improved manipulability and prevented hardware failures. Overall, this work enables reliable, contact-aware navigation in highly constrained environments.
comment: 15 pages, 3 figures
☆ Factor Graph-Based Shape Estimation for Continuum Robots via Magnus Expansion
Reconstructing the shape of continuum manipulators from sparse, noisy sensor data is a challenging task, owing to the infinite-dimensional nature of such systems. Existing approaches broadly trade off between parametric methods that yield compact state representations but lack probabilistic structure, and Cosserat rod inference on factor graphs, which provides principled uncertainty quantification at the cost of a state dimension that grows with the spatial discretization. This letter combines the strength of both paradigms by estimating the coefficients of a low-dimensional Geometric Variable Strain (GVS) parameterization within a factor graph framework. A novel kinematic factor, derived from the Magnus expansion of the strain field, encodes the closed-form rod geometry as a prior constraint linking the GVS strain coefficients to the backbone pose variables. The resulting formulation yields a compact state vector directly amenable to model-based control, while retaining the modularity, probabilistic treatment and computational efficiency of factor graph inference. The proposed method is evaluated in simulation on a 0.4 m long tendon-driven continuum robot under three measurement configurations, achieving mean position errors below 2 mm for all three scenarios and demonstrating a sixfold reduction in orientation error compared to a Gaussian process regression baseline when only position measurements are available.
☆ GaussianFlow SLAM: Monocular Gaussian Splatting SLAM Guided by GaussianFlow
Gaussian splatting has recently gained traction as a compelling map representation for SLAM systems, enabling dense and photo-realistic scene modeling. However, its application to monocular SLAM remains challenging due to the lack of reliable geometric cues from monocular input. Without geometric supervision, mapping or tracking could fall in local-minima, resulting in structural degeneracies and inaccuracies. To address this challenge, we propose GaussianFlow SLAM, a monocular 3DGS-SLAM that leverages optical flow as a geometry-aware cue to guide the optimization of both the scene structure and camera poses. By encouraging the projected motion of Gaussians, termed GaussianFlow, to align with the optical flow, our method introduces consistent structural cues to regularize both map reconstruction and pose estimation. Furthermore, we introduce normalized error-based densification and pruning modules to refine inactive and unstable Gaussians, thereby contributing to improved map quality and pose accuracy. Experiments conducted on public datasets demonstrate that our method achieves superior rendering quality and tracking accuracy compared with state-of-the-art algorithms. The source code is available at: https://github.com/url-kaist/gaussianflow-slam.
comment: 8 pages, 5 figures, 7 tables, accepted to IEEE RA-L
♻ ☆ COVER:COverage-VErified Roadmaps for Fixed-time Motion Planning in Continuous Semi-Static Environments
The ability to solve motion-planning queries within a fixed time budget is critical for deploying robotic systems in time-sensitive applications. Semi-static environments, where most of the workspace remains fixed while a subset of obstacles varies between tasks, exhibit structured variability that can be exploited to provide stronger guarantees than general-purpose planners. However, existing approaches either lack formal coverage guarantees or rely on discretizations of obstacle configurations that restrict applicability to realistic domains.
This paper introduces COVER, a framework that incrementally constructs coverage-verified roadmaps for semi-static environments. COVER decomposes the arrangement space by independently partitioning the configuration space of each movable obstacle and verifies roadmap feasibility within each partition, enabling fixed-time query resolution for verified regions.We evaluate COVER on a 7-DoF manipulator performing object-picking in tabletop and shelf environments, demonstrating broader problem-space coverage and higher query success rates than prior work, particularly with obstacles of different sizes.
♻ ☆ Angle-based Localization and Rigidity Maintenance Control for Multi-Robot Networks
In this work, we study angle-based localization and rigidity maintenance control for multi-robot networks. First, we establish the relationship between angle rigidity and bearing rigidity considering \textit{directed} sensing graphs and \textit{body-frame} bearing measurements in both $2$ and $3$-\textit{dimensional space}. In particular, we demonstrate that a framework in $\mathrm{SE}(d)$ is infinitesimally bearing rigid if and only if it is infinitesimally angle rigid and each robot obtains at least $d-1$ bearing measurements ($d \in \{2, 3\}$). Building on these findings, this paper proposes a distributed angle-based localization scheme and establishes local exponential stability under switching sensing graphs, requiring only infinitesimal angle rigidity across the visited topologies. Then, since the set of available angles strongly depends on the robots' spatial configuration due to sensing constraints, we investigate rigidity maintenance control. The \textit{angle rigidity eigenvalue} is presented as a metric for the degree of rigidity. A decentralized gradient-based controller capable of executing mission-specific commands while maintaining a sufficient level of angle rigidity is proposed. Simulations were conducted to evaluate the scheme's effectiveness and practicality.
♻ ☆ Robust Real-Time Coordination of CAVs: A Distributed Optimization Framework under Uncertainty
Achieving both safety guarantees and real-time performance in cooperative vehicle coordination remains a fundamental challenge, particularly in dynamic and uncertain environments. Existing methods often suffer from insufficient uncertainty treatment in safety modeling, which intertwines with the heavy computational burden under complex multi-vehicle coupling. This paper presents a novel coordination framework that resolves this challenge through three key innovations: 1) direct control of vehicles' trajectory distributions during coordination, formulated as a robust cooperative planning problem with adaptive enhanced safety constraints, ensuring a specified level of safety regarding the uncertainty of the interactive trajectory, 2) a fully parallel ADMM-based distributed trajectory negotiation (ADMM-DTN) algorithm that efficiently solves the optimization problem while allowing configurable negotiation rounds to balance solution quality and computational resources, and 3) an interactive attention mechanism that selectively focuses on critical interactive participants to further enhance computational efficiency. Simulation results demonstrate that our framework achieves significant advantages in safety (reducing collision rates by up to 40.79\% in various scenarios) and real-time performance compared to representative benchmarks, while maintaining strong scalability with increasing vehicle numbers. The proposed interactive attention mechanism further reduces the computational demand by 15.4\%. Real-world experiments further validate robustness and real-time feasibility with unexpected dynamic obstacles, demonstrating reliable coordination in complex traffic scenes. The experiment demo could be found at https://youtu.be/4PZwBnCsb6Q.
comment: Accept by IEEE TVT
♻ ☆ Sampling-Based Multi-Modal Multi-Robot Multi-Goal Path Planning
In many robotics applications, multiple robots are working in a shared workspace to complete a set of tasks as fast as possible. Such settings can be treated as multi-modal multi-robot multi-goal path planning problems, where each robot has to reach a set of goals. Existing approaches to this type of problem solve this using prioritization or assume synchronous task completion, and are thus neither optimal nor complete. We formalize this problem as a single centralized path planning problem and present planners that are probabilistically complete and asymptotically optimal. The planners plan in the composite space of all robots and are modifications of standard sampling-based planners with the required changes to work in our multi-modal, multi-robot, multi-goal setting. We validate the planners on a diverse range of problems including scenarios with various robots, planning horizons, and collaborative tasks such as handovers, and compare the planners against a suboptimal prioritized planner.
Videos and code for the planners and the benchmark is available at https://vhartmann.com/mrmg-planning/.
comment: 25 pages, 17 figures
♻ ☆ Soft Electroadhesive Feet for Micro Aerial Robots Perching on Smooth and Curved Surfaces
Electroadhesion (EA) provides electrically switchable adhesion and is a promising mechanism for perching micro aerial robots on smooth surfaces. However, practical implementations of soft and stretchable EA pads for aerial perching remain limited. This work presents (i) an efficient workflow for fabricating soft, stretchable electroadhesive pads with sinusoidal wave and concentric-circle electrodes in multiple sizes, (ii) a controlled experimental comparison of normal and shear adhesion under inactive (0 kV) and active (4.8 kV) conditions using an Instron-based setup, and (iii) a perching demonstration using a Crazyflie quadrotor equipped with electroadhesive feet on flat and curved substrates. Experimental results show that shear adhesion dominates, reaching forces on the order of 3 N with partial pad contact, while normal adhesion is comparatively small and strongly dependent on substrate properties. The Crazyflie prototype demonstrates repeatable attachment on smooth plastic surfaces, including curved geometries, as well as rapid detachment when the voltage is removed. These results highlight the potential of soft electroadhesive feet for lightweight and reliable perching in micro aerial vehicles (MAVs).
comment: 7 pages, 8 figures
♻ ☆ ArrayTac: A Closed-loop Piezoelectric Tactile Platform for Continuously Tunable Rendering of Shape, Stiffness, and Friction
Human touch depends on the integration of shape, stiffness, and friction, yet existing tactile displays cannot render these cues together as continuously tunable, high-fidelity signals for intuitive perception. We present ArrayTac, a closed-loop piezoelectric tactile display that simultaneously renders these three dimensions with continuous tunability on a 4 by 4 actuator array. Each unit integrates a three-stage micro-lever amplifier with end-effector Hall-effect feedback, enabling up to 5 mm displacement, greater than 500 Hz array refresh, and 123 Hz closed-loop bandwidth. In psychophysical experiments, naive participants identified three-dimensional shapes and distinguished multiple stiffness and friction levels through touch alone without training. We further demonstrate image-to-touch rendering from an RGB image and remote palpation of a medical-grade breast tumor phantom over 1,000 km, in which all 11 naive participants correctly identified tumor number and type with sub-centimeter localization error. These results establish ArrayTac as a platform for multidimensional haptic rendering and interaction.
comment: Project website: https://arraytac.github.io/
♻ ☆ Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control
Qingrui Zhao, Kaiyue Yang, Xiyu Wang, Shiqi Zhao, Yi Lu, Xinfang Zhang, Wei Yin, Qiu Shen, Xiao-Xiao Long, Xun Cao
Humanoid robots require diverse motor skills to integrate into complex environments, but bridging the kinematic and dynamic embodiment gap from human data remains a major bottleneck. We demonstrate through Hessian analysis that traditional optimization-based retargeting is inherently non-convex and prone to local optima, leading to physical artifacts like joint jumps and self-penetration. To address this, we reformulate the targeting problem as learning data distribution rather than optimizing optimal solutions, where we propose NMR, a Neural Motion Retargeting framework that transforms static geometric mapping into a dynamics-aware learned process. We first propose Clustered-Expert Physics Refinement (CEPR), a hierarchical data pipeline that leverages VAE-based motion clustering to group heterogeneous movements into latent motifs. This strategy significantly reduces the computational overhead of massively parallel reinforcement learning experts, which project and repair noisy human demonstrations onto the robot's feasible motion manifold. The resulting high-fidelity data supervises a non-autoregressive CNN-Transformer architecture that reasons over global temporal context to suppress reconstruction noise and bypass geometric traps. Experiments on the Unitree G1 humanoid across diverse dynamic tasks (e.g., martial arts, dancing) show that NMR eliminates joint jumps and significantly reduces self-collisions compared to state-of-the-art baselines. Furthermore, NMR-generated references accelerate the convergence of downstream whole-body control policies, establishing a scalable path for bridging the human-robot embodiment gap.
comment: Report, 12 pages, 5 figures, 4 tables, webpage: https://nju3dv-humanoidgroup.github.io/nmr.github.io
♻ ☆ CLAW: Composable Language-Annotated Whole-body Motion Generation
Training language-conditioned whole-body controllers for humanoid robots demands large-scale motion-language datasets. Existing approaches based on motion capture are costly and limited in diversity, while text-to-motion generative models produce purely kinematic outputs that are not guaranteed to be physically feasible. We present CLAW, a pipeline for scalable generation of language-annotated whole-body motion data for the Unitree G1 humanoid robot. CLAW composes motion primitives from a kinematic planner, parameterized by movement, heading, speed, pelvis height, and duration, and provides two browser-based interfaces--a real-time keyboard mode and a timeline-based sequence editor--for exploratory and batch data collection. A low-level controller tracks these references in MuJoCo simulation, yielding physically grounded trajectories. In parallel, a template-based engine generates diverse natural-language annotations at both segment and trajectory levels. To support scalable generation of motion-language paired data for humanoid robot learning, we make our system publicly available at: https://github.com/JianuoCao/CLAW
♻ ☆ AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
Zhenlong Yuan, Chengxuan Qian, Jing Tang, Rui Chen, Zijian Song, Lei Sun, Xiangxiang Chu, Yujun Cai, Dapeng Zhang, Shuo Li
Vision-Language-Action (VLA) models in autonomous driving systems have recently demonstrated transformative potential by integrating multimodal perception with decision-making capabilities. However, the interpretability and coherence of the decision process and the plausibility of action sequences remain largely underexplored. To address these issues, we propose AutoDrive-R$^2$, a novel VLA framework that enhances both reasoning and self-reflection capabilities of autonomous driving systems through chain-of-thought (CoT) processing and reinforcement learning (RL). Specifically, we first propose an innovative CoT dataset named nuScenesR$^2$-6K for supervised fine-tuning, which effectively builds cognitive bridges between input information and output trajectories through a four-step logical chain with self-reflection for validation. Moreover, to maximize both reasoning and self-reflection during the RL stage, we further employ the Group Relative Policy Optimization (GRPO) algorithm within a physics-grounded reward framework that incorporates spatial alignment, vehicle dynamic, and temporal smoothness criteria to ensure reliable and realistic trajectory planning. Extensive evaluation results across both nuScenes and Waymo datasets demonstrates the state-of-the-art performance and robust generalization capacity of our proposed method.
♻ ☆ Scalable Multi-Task Learning through Spiking Neural Networks with Adaptive Task-Switching Policy for Intelligent Autonomous Agents
Training resource-constrained autonomous agents on multiple tasks simultaneously is crucial for adapting to diverse real-world environments. Recent works employ reinforcement learning (RL) approach, but they still suffer from sub-optimal multi-task performance due to task interference. State-of-the-art works employ Spiking Neural Networks (SNNs) to improve RL-based multi-task learning and enable low-power/energy operations through network enhancements and spike-driven data stream processing. However, they rely on fixed task-switching intervals during its training, thus limiting its performance and scalability. To address this, we propose SwitchMT, a novel methodology that employs adaptive task-switching for effective, scalable, and simultaneous multi-task learning. SwitchMT employs the following key ideas: (1) leveraging a Deep Spiking Q-Network with active dendrites and dueling structure, that utilizes task-specific context signals to create specialized sub-networks; and (2) devising an adaptive task-switching policy that leverages both rewards and internal dynamics of the network parameters. Experimental results demonstrate that SwitchMT achieves competitive scores in multiple Atari games (i.e., Pong: -8.8, Breakout: 5.6, and Enduro: 355.2) and longer game episodes as compared to the state-of-the-art. These results also highlight the effectiveness of SwitchMT methodology in addressing task interference without increasing the network complexity, enabling intelligent autonomous agents with scalable multi-task learning capabilities.
comment: Accepted at the 63rd ACM/IEEE Design Automation Conference (DAC), July 26-29, 2026 in Long Beach, CA, USA. [Codes: https://github.com/rachmadvwp/SwitchMT]
♻ ☆ Scalable Unseen Objects 6-DoF Absolute Pose Estimation with Robotic Integration
Pose estimation-guided unseen object 6-DoF robotic manipulation is a key task in robotics. However, the scalability of current pose estimation methods to unseen objects remains a fundamental challenge, as they generally rely on CAD models or dense reference views of unseen objects, which are difficult to acquire, ultimately limit their scalability. In this paper, we introduce a novel task setup, referred to as SinRef-6D, which addresses 6-DoF absolute pose estimation for unseen objects using only a single pose-labeled reference RGB-D image captured during robotic manipulation. This setup is more scalable yet technically nontrivial due to large pose discrepancies and the limited geometric and spatial information contained in a single view. To address these issues, our key idea is to iteratively establish point-wise alignment in a common coordinate system with state space models (SSMs) as backbones. Specifically, to handle large pose discrepancies, we introduce an iterative object-space point-wise alignment strategy. Then, Point and RGB SSMs are proposed to capture long-range spatial dependencies from a single view, offering superior spatial modeling capability with linear complexity. Once pre-trained on synthetic data, SinRef-6D can estimate the 6-DoF absolute pose of an unseen object using only a single reference view. With the estimated pose, we further develop a hardware-software robotic system and integrate the proposed SinRef-6D into it in real-world settings. Extensive experiments on six benchmarks and in diverse real-world scenarios demonstrate that our SinRef-6D offers superior scalability. Additional robotic grasping experiments further validate the effectiveness of the developed robotic system. The code and robotic demos are available at https://paperreview99.github.io/SinRef-6DoF-Robotic.
comment: Accepted by TRO 2026, 18 pages, 9 figures
♻ ☆ Optimal Solutions for the Moving Target Vehicle Routing Problem with Obstacles via Lazy Branch and Price
The Moving Target Vehicle Routing Problem with Obstacles (MT-VRP-O) seeks trajectories for several agents that collectively intercept a set of moving targets. Each target has one or more time windows where it must be visited, and the agents must avoid static obstacles and satisfy speed and capacity constraints. We introduce Lazy Branch-and-Price with Relaxed Continuity (Lazy BPRC), which finds optimal solutions for the MT-VRP-O. Lazy BPRC applies the branch-and-price framework for VRPs, which alternates between a restricted master problem (RMP) and a pricing problem. The RMP aims to select a sequence of target-time window pairings (called a tour) for each agent to follow, from a limited subset of tours. The pricing problem adds tours to the limited subset. Conventionally, solving the RMP requires computing the cost for an agent to follow each tour in the limited subset. Computing these costs in the MT-VRP-O is computationally intensive, since it requires collision-free motion planning between moving targets. Lazy BPRC defers cost computations by solving the RMP using lower bounds on the costs of each tour, computed via motion planning with relaxed continuity constraints. We lazily evaluate the true costs of tours as-needed. We compute a tour's cost by searching for a shortest path on a Graph of Convex Sets (GCS), and we accelerate this search using our continuity relaxation method. We demonstrate that Lazy BPRC runs up to an order of magnitude faster than two ablations.
♻ ☆ Optimal Solutions for the Moving Target Vehicle Routing Problem via Branch-and-Price with Relaxed Continuity ICAPS 2026
The Moving Target Vehicle Routing Problem (MT-VRP) seeks trajectories for several agents that intercept a set of moving targets, subject to speed, time window, and capacity constraints. We introduce an exact algorithm, Branch-and-Price with Relaxed Continuity (BPRC), for the MT-VRP. The main challenge in a branch-and-price approach for the MT-VRP is the pricing subproblem, which is complicated by moving targets and time-dependent travel costs between targets. Our key contribution is a new labeling algorithm that solves this subproblem by means of a novel dominance criterion tailored for problems with moving targets. Numerical results on instances with up to 25 targets show that our algorithm finds optimal solutions more than an order of magnitude faster than a baseline based on previous work, showing particular strength in scenarios with limited agent capacities.
comment: Accepted to ICAPS 2026
♻ ☆ Special Unitary Parameterized Estimators of Rotation ICLR 2026
This paper revisits the topic of rotation estimation through the lens of special unitary matrices. We begin by reformulating Wahba's problem using $SU(2)$ to derive multiple solutions that yield linear constraints on corresponding quaternion parameters. We then explore applications of these constraints by formulating efficient methods for related problems. Finally, from this theoretical foundation, we propose two novel continuous representations for learning rotations in neural networks. Extensive experiments validate the effectiveness of the proposed methods.
comment: Final version to be published at ICLR 2026; 33 pages