Robotics 45
☆ GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation
In this paper, we propose a training-free framework for vision-and-language
navigation (VLN). Existing zero-shot VLN methods are mainly designed for
discrete environments or involve unsupervised training in continuous simulator
environments, which makes it challenging to generalize and deploy them in
real-world scenarios. To achieve a training-free framework in continuous
environments, our framework formulates navigation guidance as graph constraint
optimization by decomposing instructions into explicit spatial constraints. The
constraint-driven paradigm decodes spatial semantics through constraint
solving, enabling zero-shot adaptation to unseen environments. Specifically, we
construct a spatial constraint library covering all types of spatial
relationship mentioned in VLN instructions. The human instruction is decomposed
into a directed acyclic graph, with waypoint nodes, object nodes and edges,
which are used as queries to retrieve the library to build the graph
constraints. The graph constraint optimization is solved by the constraint
solver to determine the positions of waypoints, obtaining the robot's
navigation path and final goal. To handle cases of no solution or multiple
solutions, we construct a navigation tree and the backtracking mechanism.
Extensive experiments on standard benchmarks demonstrate significant
improvements in success rate and navigation efficiency compared to
state-of-the-art zero-shot VLN methods. We further conduct real-world
experiments to show that our framework can effectively generalize to new
environments and instruction sets, paving the way for a more robust and
autonomous navigation framework.
comment: Accepted to CoRL 2025. Project page: [this https
URL](https://bagh2178.github.io/GC-VLN/)
☆ Coordinated Motion Planning of a Wearable Multi-Limb System for Enhanced Human-Robot Interaction IROS 2023
Supernumerary Robotic Limbs (SRLs) can enhance human capability within close
proximity. However, as a wearable device, the generated moment from its
operation acts on the human body as an external torque. When the moments
increase, more muscle units are activated for balancing, and it can result in
reduced muscular null space. Therefore, this paper suggests a concept of a
motion planning layer that reduces the generated moment for enhanced
Human-Robot Interaction. It modifies given trajectories with desirable angular
acceleration and position deviation limits. Its performance to reduce the
moment is demonstrated through the simulation, which uses simplified human and
robotic system models.
comment: Presented in IROS 2023 Workshop (Multilimb Coordination in Human
Neuroscience and Robotics: Classical and Learning Perspectives)
☆ DECAMP: Towards Scene-Consistent Multi-Agent Motion Prediction with Disentangled Context-Aware Pre-Training
Trajectory prediction is a critical component of autonomous driving,
essential for ensuring both safety and efficiency on the road. However,
traditional approaches often struggle with the scarcity of labeled data and
exhibit suboptimal performance in multi-agent prediction scenarios. To address
these challenges, we introduce a disentangled context-aware pre-training
framework for multi-agent motion prediction, named DECAMP. Unlike existing
methods that entangle representation learning with pretext tasks, our framework
decouples behavior pattern learning from latent feature reconstruction,
prioritizing interpretable dynamics and thereby enhancing scene representation
for downstream prediction. Additionally, our framework incorporates
context-aware representation learning alongside collaborative spatial-motion
pretext tasks, which enables joint optimization of structural and intentional
reasoning while capturing the underlying dynamic intentions. Our experiments on
the Argoverse 2 benchmark showcase the superior performance of our method, and
the results attained underscore its effectiveness in multi-agent motion
forecasting. To the best of our knowledge, this is the first context
autoencoder framework for multi-agent motion forecasting in autonomous driving.
The code and models will be made publicly available.
☆ Mutual Information Tracks Policy Coherence in Reinforcement Learning
Reinforcement Learning (RL) agents deployed in real-world environments face
degradation from sensor faults, actuator wear, and environmental shifts, yet
lack intrinsic mechanisms to detect and diagnose these failures. We present an
information-theoretic framework that reveals both the fundamental dynamics of
RL and provides practical methods for diagnosing deployment-time anomalies.
Through analysis of state-action mutual information patterns in a robotic
control task, we first demonstrate that successful learning exhibits
characteristic information signatures: mutual information between states and
actions steadily increases from 0.84 to 2.83 bits (238% growth) despite growing
state entropy, indicating that agents develop increasingly selective attention
to task-relevant patterns. Intriguingly, states, actions and next states joint
mutual information, MI(S,A;S'), follows an inverted U-curve, peaking during
early learning before declining as the agent specializes suggesting a
transition from broad exploration to efficient exploitation. More immediately
actionable, we show that information metrics can differentially diagnose system
failures: observation-space, i.e., states noise (sensor faults) produces broad
collapses across all information channels with pronounced drops in state-action
coupling, while action-space noise (actuator faults) selectively disrupts
action-outcome predictability while preserving state-action relationships. This
differential diagnostic capability demonstrated through controlled perturbation
experiments enables precise fault localization without architectural
modifications or performance degradation. By establishing information patterns
as both signatures of learning and diagnostic for system health, we provide the
foundation for adaptive RL systems capable of autonomous fault detection and
policy adjustment based on information-theoretic principles.
comment: 10 pages, 4 figures, 1 table
☆ TASC: Task-Aware Shared Control for Teleoperated Manipulation
We present TASC, a Task-Aware Shared Control framework for teleoperated
manipulation that infers task-level user intent and provides assistance
throughout the task. To support everyday tasks without predefined knowledge,
TASC constructs an open-vocabulary interaction graph from visual input to
represent functional object relationships, and infers user intent accordingly.
A shared control policy then provides rotation assistance during both grasping
and object interaction, guided by spatial constraints predicted by a
vision-language model. Our method addresses two key challenges in
general-purpose, long-horizon shared control: (1) understanding and inferring
task-level user intent, and (2) generalizing assistance across diverse objects
and tasks. Experiments in both simulation and the real world demonstrate that
TASC improves task efficiency and reduces user input effort compared to prior
methods. To the best of our knowledge, this is the first shared control
framework that supports everyday manipulation tasks with zero-shot
generalization. The code that supports our experiments is publicly available at
https://github.com/fitz0401/tasc.
☆ Self-supervised Learning Of Visual Pose Estimation Without Pose Labels By Classifying LED States
We introduce a model for monocular RGB relative pose estimation of a ground
robot that trains from scratch without pose labels nor prior knowledge about
the robot's shape or appearance. At training time, we assume: (i) a robot
fitted with multiple LEDs, whose states are independent and known at each
frame; (ii) knowledge of the approximate viewing direction of each LED; and
(iii) availability of a calibration image with a known target distance, to
address the ambiguity of monocular depth estimation. Training data is collected
by a pair of robots moving randomly without needing external infrastructure or
human supervision. Our model trains on the task of predicting from an image the
state of each LED on the robot. In doing so, it learns to predict the position
of the robot in the image, its distance, and its relative bearing. At inference
time, the state of the LEDs is unknown, can be arbitrary, and does not affect
the pose estimation performance. Quantitative experiments indicate that our
approach: is competitive with SoA approaches that require supervision from pose
labels or a CAD model of the robot; generalizes to different domains; and
handles multi-robot pose estimation.
comment: accepted at CoRL 2025
☆ Data-fused Model Predictive Control with Guarantees: Application to Flying Humanoid Robots
This paper introduces a Data-Fused Model Predictive Control (DFMPC) framework
that combines physics-based models with data-driven representations of unknown
dynamics. Leveraging Willems' Fundamental Lemma and an artificial equilibrium
formulation, the method enables tracking of changing, potentially unreachable
setpoints while explicitly handling measurement noise through slack variables
and regularization. We provide guarantees of recursive feasibility and
practical stability under input-output constraints for a specific class of
reference signals. The approach is validated on the iRonCub flying humanoid
robot, integrating analytical momentum models with data-driven turbine
dynamics. Simulations show improved tracking and robustness compared to a
purely model-based MPC, while maintaining real-time feasibility.
comment: 8 pages, 3 figures
☆ Acetrans: An Autonomous Corridor-Based and Efficient UAV Suspended Transport System
Unmanned aerial vehicles (UAVs) with suspended payloads offer significant
advantages for aerial transportation in complex and cluttered environments.
However, existing systems face critical limitations, including unreliable
perception of the cable-payload dynamics, inefficient planning in large-scale
environments, and the inability to guarantee whole-body safety under cable
bending and external disturbances. This paper presents Acetrans, an Autonomous,
Corridor-based, and Efficient UAV suspended transport system that addresses
these challenges through a unified perception, planning, and control framework.
A LiDAR-IMU fusion module is proposed to jointly estimate both payload pose and
cable shape under taut and bent modes, enabling robust whole-body state
estimation and real-time filtering of cable point clouds. To enhance planning
scalability, we introduce the Multi-size-Aware Configuration-space Iterative
Regional Inflation (MACIRI) algorithm, which generates safe flight corridors
while accounting for varying UAV and payload geometries. A spatio-temporal,
corridor-constrained trajectory optimization scheme is then developed to ensure
dynamically feasible and collision-free trajectories. Finally, a nonlinear
model predictive controller (NMPC) augmented with cable-bending constraints
provides robust whole-body safety during execution. Simulation and experimental
results validate the effectiveness of Acetrans, demonstrating substantial
improvements in perception accuracy, planning efficiency, and control safety
compared to state-of-the-art methods.
☆ Robot guide with multi-agent control and automatic scenario generation with LLM
The work describes the development of a hybrid control architecture for an
anthropomorphic tour guide robot, combining a multi-agent resource management
system with automatic behavior scenario generation based on large language
models. The proposed approach aims to overcome the limitations of traditional
systems, which rely on manual tuning of behavior scenarios. These limitations
include manual configuration, low flexibility, and lack of naturalness in robot
behavior. The process of preparing tour scenarios is implemented through a
two-stage generation: first, a stylized narrative is created, then non-verbal
action tags are integrated into the text. The multi-agent system ensures
coordination and conflict resolution during the execution of parallel actions,
as well as maintaining default behavior after the completion of main
operations, contributing to more natural robot behavior. The results obtained
from the trial demonstrate the potential of the proposed approach for
automating and scaling social robot control systems.
comment: 14 pages, 5 figures, 2 tables, 1 demo-video and repository link
☆ GundamQ: Multi-Scale Spatio-Temporal Representation Learning for Robust Robot Path Planning
In dynamic and uncertain environments, robotic path planning demands accurate
spatiotemporal environment understanding combined with robust decision-making
under partial observability. However, current deep reinforcement learning-based
path planning methods face two fundamental limitations: (1) insufficient
modeling of multi-scale temporal dependencies, resulting in suboptimal
adaptability in dynamic scenarios, and (2) inefficient exploration-exploitation
balance, leading to degraded path quality. To address these challenges, we
propose GundamQ: A Multi-Scale Spatiotemporal Q-Network for Robotic Path
Planning. The framework comprises two key modules: (i) the Spatiotemporal
Perception module, which hierarchically extracts multi-granularity spatial
features and multi-scale temporal dependencies ranging from instantaneous to
extended time horizons, thereby improving perception accuracy in dynamic
environments; and (ii) the Adaptive Policy Optimization module, which balances
exploration and exploitation during training while optimizing for smoothness
and collision probability through constrained policy updates. Experiments in
dynamic environments demonstrate that GundamQ achieves a 15.3\% improvement in
success rate and a 21.7\% increase in overall path quality, significantly
outperforming existing state-of-the-art methods.
comment: 6 pages, 5 figures
☆ A Holistic Architecture for Monitoring and Optimization of Robust Multi-Agent Path Finding Plan Execution
The goal of Multi-Agent Path Finding (MAPF) is to find a set of paths for a
fleet of agents moving in a shared environment such that the agents reach their
goals without colliding with each other. In practice, some of the robots
executing the plan may get delayed, which can introduce collision risk.
Although robust execution methods are used to ensure safety even in the
presence of delays, the delays may still have a significant impact on the
duration of the execution. At some point, the accumulated delays may become
significant enough that instead of continuing with the execution of the
original plan, even if it was optimal, there may now exist an alternate plan
which will lead to a shorter execution. However, the problem is how to decide
when to search for the alternate plan, since it is a costly procedure. In this
paper, we propose a holistic architecture for robust execution of MAPF plans,
its monitoring and optimization. We exploit a robust execution method called
Action Dependency Graph to maintain an estimate of the expected execution
duration during the plan's execution. This estimate is used to predict the
potential that finding an alternate plan would lead to shorter execution. We
empirically evaluate the architecture in experiments in a real-time simulator
which we designed to mimic our real-life demonstrator of an autonomous
warehouse robotic fleet.
comment: 23 pages, 10 figures
☆ DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning
This letter introduces DiffAero, a lightweight, GPU-accelerated, and fully
differentiable simulation framework designed for efficient quadrotor control
policy learning. DiffAero supports both environment-level and agent-level
parallelism and integrates multiple dynamics models, customizable sensor stacks
(IMU, depth camera, and LiDAR), and diverse flight tasks within a unified,
GPU-native training interface. By fully parallelizing both physics and
rendering on the GPU, DiffAero eliminates CPU-GPU data transfer bottlenecks and
delivers orders-of-magnitude improvements in simulation throughput. In contrast
to existing simulators, DiffAero not only provides high-performance simulation
but also serves as a research platform for exploring differentiable and hybrid
learning algorithms. Extensive benchmarks and real-world flight experiments
demonstrate that DiffAero and hybrid learning algorithms combined can learn
robust flight policies in hours on consumer-grade hardware. The code is
available at https://github.com/flyingbitac/diffaero.
comment: 8 pages, 11 figures, 1 table
☆ CaR1: A Multi-Modal Baseline for BEV Vehicle Segmentation via Camera-Radar Fusion
Santiago Montiel-Marín, Angel Llamazares, Miguel Antunes-García, Fabio Sánchez-García, Luis M. Bergasa
Camera-radar fusion offers a robust and cost-effective alternative to
LiDAR-based autonomous driving systems by combining complementary sensing
capabilities: cameras provide rich semantic cues but unreliable depth, while
radar delivers sparse yet reliable position and motion information. We
introduce CaR1, a novel camera-radar fusion architecture for BEV vehicle
segmentation. Built upon BEVFusion, our approach incorporates a grid-wise radar
encoding that discretizes point clouds into structured BEV features and an
adaptive fusion mechanism that dynamically balances sensor contributions.
Experiments on nuScenes demonstrate competitive segmentation performance (57.6
IoU), on par with state-of-the-art methods. Code is publicly available
\href{https://www.github.com/santimontiel/car1}{online}.
comment: 4 pages, 2 figures
☆ Efficient Learning-Based Control of a Legged Robot in Lunar Gravity
Legged robots are promising candidates for exploring challenging areas on
low-gravity bodies such as the Moon, Mars, or asteroids, thanks to their
advanced mobility on unstructured terrain. However, as planetary robots' power
and thermal budgets are highly restricted, these robots need energy-efficient
control approaches that easily transfer to multiple gravity environments. In
this work, we introduce a reinforcement learning-based control approach for
legged robots with gravity-scaled power-optimized reward functions. We use our
approach to develop and validate a locomotion controller and a base pose
controller in gravity environments from lunar gravity (1.62 m/s2) to a
hypothetical super-Earth (19.62 m/s2). Our approach successfully scales across
these gravity levels for locomotion and base pose control with the
gravity-scaled reward functions. The power-optimized locomotion controller
reached a power consumption for locomotion of 23.4 W in Earth gravity on a
15.65 kg robot at 0.4 m/s, a 23 % improvement over the baseline policy.
Additionally, we designed a constant-force spring offload system that allowed
us to conduct real-world experiments on legged locomotion in lunar gravity. In
lunar gravity, the power-optimized control policy reached 12.2 W, 36 % less
than a baseline controller which is not optimized for power efficiency. Our
method provides a scalable approach to developing power-efficient locomotion
controllers for legged robots across multiple gravity levels.
☆ HHI-Assist: A Dataset and Benchmark of Human-Human Interaction in Physical Assistance Scenario
The increasing labor shortage and aging population underline the need for
assistive robots to support human care recipients. To enable safe and
responsive assistance, robots require accurate human motion prediction in
physical interaction scenarios. However, this remains a challenging task due to
the variability of assistive settings and the complexity of coupled dynamics in
physical interactions. In this work, we address these challenges through two
key contributions: (1) HHI-Assist, a dataset comprising motion capture clips of
human-human interactions in assistive tasks; and (2) a conditional
Transformer-based denoising diffusion model for predicting the poses of
interacting agents. Our model effectively captures the coupled dynamics between
caregivers and care receivers, demonstrating improvements over baselines and
strong generalization to unseen scenarios. By advancing interaction-aware
motion prediction and introducing a new dataset, our work has the potential to
significantly enhance robotic assistance policies. The dataset and code are
available at: https://sites.google.com/view/hhi-assist/home
comment: Accepted to RA-L 2025
☆ Prespecified-Performance Kinematic Tracking Control for Aerial Manipulation
This paper studies the kinematic tracking control problem for aerial
manipulators. Existing kinematic tracking control methods, which typically
employ proportional-derivative feedback or tracking-error-based feedback
strategies, may fail to achieve tracking objectives within specified time
constraints. To address this limitation, we propose a novel control framework
comprising two key components: end-effector tracking control based on a
user-defined preset trajectory and quadratic programming-based reference
allocation. Compared with state-of-the-art approaches, the proposed method has
several attractive features. First, it ensures that the end-effector reaches
the desired position within a preset time while keeping the tracking error
within a performance envelope that reflects task requirements. Second,
quadratic programming is employed to allocate the references of the quadcopter
base and the Delta arm, while considering the physical constraints of the
aerial manipulator, thus preventing solutions that may violate physical
limitations. The proposed approach is validated through three experiments.
Experimental results demonstrate the effectiveness of the proposed algorithm
and its capability to guarantee that the target position is reached within the
preset time.
☆ TwinTac: A Wide-Range, Highly Sensitive Tactile Sensor with Real-to-Sim Digital Twin Sensor Model IROS 2025
Robot skill acquisition processes driven by reinforcement learning often rely
on simulations to efficiently generate large-scale interaction data. However,
the absence of simulation models for tactile sensors has hindered the use of
tactile sensing in such skill learning processes, limiting the development of
effective policies driven by tactile perception. To bridge this gap, we present
TwinTac, a system that combines the design of a physical tactile sensor with
its digital twin model. Our hardware sensor is designed for high sensitivity
and a wide measurement range, enabling high quality sensing data essential for
object interaction tasks. Building upon the hardware sensor, we develop the
digital twin model using a real-to-sim approach. This involves collecting
synchronized cross-domain data, including finite element method results and the
physical sensor's outputs, and then training neural networks to map simulated
data to real sensor responses. Through experimental evaluation, we
characterized the sensitivity of the physical sensor and demonstrated the
consistency of the digital twin in replicating the physical sensor's output.
Furthermore, by conducting an object classification task, we showed that
simulation data generated by our digital twin sensor can effectively augment
real-world data, leading to improved accuracy. These results highlight
TwinTac's potential to bridge the gap in cross-domain learning tasks.
comment: 7 pages, 9 figures, 1 table, to be published in IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS 2025)
☆ Design and Evaluation of Two Spherical Systems for Mobile 3D Mapping
Spherical robots offer unique advantages for mapping applications in
hazardous or confined environments, thanks to their protective shells and
omnidirectional mobility. This work presents two complementary spherical
mapping systems: a lightweight, non-actuated design and an actuated variant
featuring internal pendulum-driven locomotion. Both systems are equipped with a
Livox Mid-360 solid-state LiDAR sensor and run LiDAR-Inertial Odometry (LIO)
algorithms on resource-constrained hardware. We assess the mapping accuracy of
these systems by comparing the resulting 3D point-clouds from the LIO
algorithms to a ground truth map. The results indicate that the performance of
state-of-the-art LIO algorithms deteriorates due to the high dynamic movement
introduced by the spherical locomotion, leading to globally inconsistent maps
and sometimes unrecoverable drift.
comment: 6 Pages, 9 figures, International Workshop 3D-AdViCE in conjunction
with 12th ECMR 2025
☆ Efficient and Accurate Downfacing Visual Inertial Odometry
Visual Inertial Odometry (VIO) is a widely used computer vision method that
determines an agent's movement through a camera and an IMU sensor. This paper
presents an efficient and accurate VIO pipeline optimized for applications on
micro- and nano-UAVs. The proposed design incorporates state-of-the-art feature
detection and tracking methods (SuperPoint, PX4FLOW, ORB), all optimized and
quantized for emerging RISC-V-based ultra-low-power parallel systems on chips
(SoCs). Furthermore, by employing a rigid body motion model, the pipeline
reduces estimation errors and achieves improved accuracy in planar motion
scenarios. The pipeline's suitability for real-time VIO is assessed on an
ultra-low-power SoC in terms of compute requirements and tracking accuracy
after quantization. The pipeline, including the three feature tracking methods,
was implemented on the SoC for real-world validation. This design bridges the
gap between high-accuracy VIO pipelines that are traditionally run on
computationally powerful systems and lightweight implementations suitable for
microcontrollers. The optimized pipeline on the GAP9 low-power SoC demonstrates
an average reduction in RMSE of up to a factor of 3.65x over the baseline
pipeline when using the ORB feature tracker. The analysis of the computational
complexity of the feature trackers further shows that PX4FLOW achieves on-par
tracking accuracy with ORB at a lower runtime for movement speeds below 24
pixels/frame.
comment: This article has been accepted for publication in the IEEE Internet
of Things Journal (IoT-J)
☆ Towards simulation-based optimization of compliant fingers for high-speed connector assembly
Mechanical compliance is a key design parameter for dynamic contact-rich
manipulation, affecting task success and safety robustness over contact
geometry variation. Design of soft robotic structures, such as compliant
fingers, requires choosing design parameters which affect geometry and
stiffness, and therefore manipulation performance and robustness. Today, these
parameters are chosen through either hardware iteration, which takes
significant development time, or simplified models (e.g. planar), which can't
address complex manipulation task objectives. Improvements in dynamic
simulation, especially with contact and friction modeling, present a potential
design tool for mechanical compliance. We propose a simulation-based design
tool for compliant mechanisms which allows design with respect to task-level
objectives, such as success rate. This is applied to optimize design parameters
of a structured compliant finger to reduce failure cases inside a tolerance
window in insertion tasks. The improvement in robustness is then validated on a
real robot using tasks from the benchmark NIST task board. The finger stiffness
affects the tolerance window: optimized parameters can increase tolerable
ranges by a factor of 2.29, with workpiece variation up to 8.6 mm being
compensated. However, the trends remain task-specific. In some tasks, the
highest stiffness yields the widest tolerable range, whereas in others the
opposite is observed, motivating need for design tools which can consider
application-specific geometry and dynamics.
☆ Gaussian path model library for intuitive robot motion programming by demonstration
This paper presents a system for generating Gaussian path models from
teaching data representing the path shape. In addition, methods for using these
path models to classify human demonstrations of paths are introduced. By
generating a library of multiple Gaussian path models of various shapes, human
demonstrations can be used for intuitive robot motion programming. A method for
modifying existing Gaussian path models by demonstration through geometric
analysis is also presented.
☆ Detection of Anomalous Behavior in Robot Systems Based on Machine Learning
Ensuring the safe and reliable operation of robotic systems is paramount to
prevent potential disasters and safeguard human well-being. Despite rigorous
design and engineering practices, these systems can still experience
malfunctions, leading to safety risks. In this study, we present a machine
learning-based approach for detecting anomalies in system logs to enhance the
safety and reliability of robotic systems. We collected logs from two distinct
scenarios using CoppeliaSim and comparatively evaluated several machine
learning models, including Logistic Regression (LR), Support Vector Machine
(SVM), and an Autoencoder. Our system was evaluated in a quadcopter context
(Context 1) and a Pioneer robot context (Context 2). Results showed that while
LR demonstrated superior performance in Context 1, the Autoencoder model proved
to be the most effective in Context 2. This highlights that the optimal model
choice is context-dependent, likely due to the varying complexity of anomalies
across different robotic platforms. This research underscores the value of a
comparative approach and demonstrates the particular strengths of autoencoders
for detecting complex anomalies in robotic systems.
♻ ☆ Repeatable Energy-Efficient Perching for Flapping-Wing Robots Using Soft Grippers
With the emergence of new flapping-wing micro aerial vehicle (FWMAV) designs,
a need for extensive and advanced mission capabilities arises. FWMAVs try to
adapt and emulate the flight features of birds and flying insects. While
current designs already achieve high manoeuvrability, they still almost
entirely lack perching and take-off abilities. These capabilities could, for
instance, enable long-term monitoring and surveillance missions, and operations
in cluttered environments or in proximity to humans and animals. We present the
development and testing of a framework that enables repeatable perching and
take-off for small to medium-sized FWMAVs, utilising soft, non-damaging
grippers. Thanks to its novel active-passive actuation system, an
energy-conserving state can be achieved and indefinitely maintained while the
vehicle is perched. A prototype of the proposed system weighing under 39 g was
manufactured and extensively tested on a 110 g flapping-wing robot. Successful
free-flight tests demonstrated the full mission cycle of landing, perching and
subsequent take-off. The telemetry data recorded during the flights yields
extensive insight into the system's behaviour and is a valuable step towards
full automation and optimisation of the entire take-off and landing cycle.
comment: 16 pages, 16 figures, 5 multimedia extensions
♻ ☆ A Conflicts-free, Speed-lossless KAN-based Reinforcement Learning Decision System for Interactive Driving in Roundabouts
Safety and efficiency are crucial for autonomous driving in roundabouts,
especially mixed traffic with both autonomous vehicles (AVs) and human-driven
vehicles. This paper presents a learning-based algorithm that promotes safe and
efficient driving across varying roundabout traffic conditions. A deep
Q-learning network is used to learn optimal strategies in complex multi-vehicle
roundabout scenarios, while a Kolmogorov-Arnold Network (KAN) improves the AVs'
environmental understanding. To further enhance safety, an action inspector
filters unsafe actions, and a route planner optimizes driving efficiency.
Moreover, model predictive control ensures stability and precision in
execution. Experimental results demonstrate that the proposed system
consistently outperforms state-of-the-art methods, achieving fewer collisions,
reduced travel time, and stable training with smooth reward convergence.
comment: 14 pages, 11 figures, published in IEEE Transactions on Intelligent
Transportation Systems
♻ ☆ Environmental force sensing helps robots traverse cluttered large obstacles using physical interaction
Many applications require robots to move through complex 3-D terrain with
large obstacles, such as self-driving, search and rescue, and extraterrestrial
exploration. Although robots are already excellent at avoiding sparse
obstacles, they still struggle in traversing cluttered large obstacles. To make
progress, we need to better understand how to use and control the physical
interaction with obstacles to traverse them. Forest floor-dwelling cockroaches
can use physical interaction to transition between different locomotor modes to
traverse flexible, grass-like beams of a large range of stiffness. Inspired by
this, here we studied whether and how environmental force sensing helps robots
make active adjustments to traverse cluttered large obstacles. We developed a
physics model and a simulation of a minimalistic robot capable of sensing
environmental forces during traversal of beam obstacles. Then, we developed a
force-feedback control strategy, which estimated beam stiffness from the sensed
contact force using the physics model. Then in simulation we used the estimated
stiffness to control the robot to either stay in or transition to the more
favorable locomotor modes to traverse. When beams were stiff, force sensing
induced the robot to transition from a more costly pitch mode to a less costly
roll mode, which helped the robot traverse with a higher success rate and less
energy consumed. By contrast, if the robot simply pushed forward or always
avoided obstacles, it would consume more energy, become stuck in front of
beams, or even flip over. When the beams were flimsy, force sensing guided the
robot to simply push across the beams. In addition, we demonstrated the
robustness of beam stiffness estimation against body oscillations, randomness
in oscillation, and uncertainty in position sensing. We also found that a
shorter sensorimotor delay reduced energy cost of traversal.
♻ ☆ LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
Predictive manipulation has recently gained considerable attention in the
Embodied AI community due to its potential to improve robot policy performance
by leveraging predicted states. However, generating accurate future visual
states of robot-object interactions from world models remains a well-known
challenge, particularly in achieving high-quality pixel-level representations.
To this end, we propose LaDi-WM, a world model that predicts the latent space
of future states using diffusion modeling. Specifically, LaDi-WM leverages the
well-established latent space aligned with pre-trained Visual Foundation Models
(VFMs), which comprises both geometric features (DINO-based) and semantic
features (CLIP-based). We find that predicting the evolution of the latent
space is easier to learn and more generalizable than directly predicting
pixel-level images. Building on LaDi-WM, we design a diffusion policy that
iteratively refines output actions by incorporating forecasted states, thereby
generating more consistent and accurate results. Extensive experiments on both
synthetic and real-world benchmarks demonstrate that LaDi-WM significantly
enhances policy performance by 27.9\% on the LIBERO-LONG benchmark and 20\% on
the real-world scenario. Furthermore, our world model and policies achieve
impressive generalizability in real-world experiments.
comment: CoRL 2025
♻ ☆ MiniTac: An Ultra-Compact 8 mm Vision-Based Tactile Sensor for Enhanced Palpation in Robot-Assisted Minimally Invasive Surgery
Robot-assisted minimally invasive surgery (RAMIS) provides substantial
benefits over traditional open and laparoscopic methods. However, a significant
limitation of RAMIS is the surgeon's inability to palpate tissues, a crucial
technique for examining tissue properties and detecting abnormalities,
restricting the widespread adoption of RAMIS. To overcome this obstacle, we
introduce MiniTac, a novel vision-based tactile sensor with an ultra-compact
cross-sectional diameter of 8 mm, designed for seamless integration into
mainstream RAMIS devices, particularly the Da Vinci surgical systems. MiniTac
features a novel mechanoresponsive photonic elastomer membrane that changes
color distribution under varying contact pressures. This color change is
captured by an embedded miniature camera, allowing MiniTac to detect tumors
both on the tissue surface and in deeper layers typically obscured from
endoscopic view. MiniTac's efficacy has been rigorously tested on both phantoms
and ex-vivo tissues. By leveraging advanced mechanoresponsive photonic
materials, MiniTac represents a significant advancement in integrating tactile
sensing into RAMIS, potentially expanding its applicability to a wider array of
clinical scenarios that currently rely on traditional surgical approaches.
comment: accepted for publication in the IEEE Robotics and Automation Letters
(RA-L)
♻ ☆ Embedding high-resolution touch across robotic hands enables adaptive human-like grasping
Zihang Zhao, Wanlin Li, Yuyang Li, Tengyu Liu, Boren Li, Meng Wang, Kai Du, Hangxin Liu, Yixin Zhu, Qining Wang, Kaspar Althoefer, Song-Chun Zhu
Developing robotic hands that adapt to real-world dynamics remains a
fundamental challenge in robotics and machine intelligence. Despite significant
advances in replicating human hand kinematics and control algorithms, robotic
systems still struggle to match human capabilities in dynamic environments,
primarily due to inadequate tactile feedback. To bridge this gap, we present
F-TAC Hand, a biomimetic hand featuring high-resolution tactile sensing (0.1mm
spatial resolution) across 70% of its surface area. Through optimized hand
design, we overcome traditional challenges in integrating high-resolution
tactile sensors while preserving the full range of motion. The hand, powered by
our generative algorithm that synthesizes human-like hand configurations,
demonstrates robust grasping capabilities in dynamic real-world conditions.
Extensive evaluation across 600 real-world trials demonstrates that this
tactile-embodied system significantly outperforms non-tactile-informed
alternatives in complex manipulation tasks (p<0.0001). These results provide
empirical evidence for the critical role of rich tactile embodiment in
developing advanced robotic intelligence, offering new perspectives on the
relationship between physical sensing capabilities and intelligent behavior.
♻ ☆ Tac-Man: Tactile-Informed Prior-Free Manipulation of Articulated Objects
Integrating robots into human-centric environments such as homes,
necessitates advanced manipulation skills as robotic devices will need to
engage with articulated objects like doors and drawers. Key challenges in
robotic manipulation of articulated objects are the unpredictability and
diversity of these objects' internal structures, which render models based on
object kinematics priors, both explicit and implicit, inadequate. Their
reliability is significantly diminished by pre-interaction ambiguities,
imperfect structural parameters, encounters with unknown objects, and
unforeseen disturbances. Here, we present a prior-free strategy, Tac-Man,
focusing on maintaining stable robot-object contact during manipulation.
Without relying on object priors, Tac-Man leverages tactile feedback to enable
robots to proficiently handle a variety of articulated objects, including those
with complex joints, even when influenced by unexpected disturbances.
Demonstrated in both real-world experiments and extensive simulations, it
consistently achieves near-perfect success in dynamic and varied settings,
outperforming existing methods. Our results indicate that tactile sensing alone
suffices for managing diverse articulated objects, offering greater robustness
and generalization than prior-based approaches. This underscores the importance
of detailed contact modeling in complex manipulation tasks, especially with
articulated objects. Advancements in tactile-informed approaches significantly
expand the scope of robotic applications in human-centric environments,
particularly where accurate models are difficult to obtain. See additional
material at https://tacman-aom.github.io.
comment: Accepted for publication in the IEEE Transactions on Robotics (T-RO)
♻ ☆ TacMan-Turbo: Proactive Tactile Control for Robust and Efficient Articulated Object Manipulation
Adept manipulation of articulated objects is essential for robots to operate
successfully in human environments. Such manipulation requires both
effectiveness -- reliable operation despite uncertain object structures -- and
efficiency -- swift execution with minimal redundant steps and smooth actions.
Existing approaches struggle to achieve both objectives simultaneously: methods
relying on predefined kinematic models lack effectiveness when encountering
structural variations, while tactile-informed approaches achieve robust
manipulation without kinematic priors but compromise efficiency through
reactive, step-by-step exploration-compensation cycles. This paper introduces
TacMan-Turbo, a novel proactive tactile control framework for articulated
object manipulation that resolves this fundamental trade-off. Unlike previous
approaches that treat tactile contact deviations merely as error signals
requiring compensation, our method interprets these deviations as rich sources
of local kinematic information. This new perspective enables our controller to
predict optimal future interactions and make proactive adjustments,
significantly enhancing manipulation efficiency. In comprehensive evaluations
across 200 diverse simulated articulated objects and real-world experiments,
our approach maintains a 100% success rate while significantly outperforming
the previous tactile-informed method in time efficiency, action efficiency, and
trajectory smoothness (all p-values < 0.0001). These results demonstrate that
the long-standing trade-off between effectiveness and efficiency in articulated
object manipulation can be successfully resolved without relying on prior
kinematic knowledge.
♻ ☆ B*: Efficient and Optimal Base Placement for Fixed-Base Manipulators
B* is a novel optimization framework that addresses a critical challenge in
fixed-base manipulator robotics: optimal base placement. Current methods rely
on pre-computed kinematics databases generated through sampling to search for
solutions. However, they face an inherent trade-off between solution optimality
and computational efficiency when determining sampling resolution. To address
these limitations, B* unifies multiple objectives without database dependence.
The framework employs a two-layer hierarchical approach. The outer layer
systematically manages terminal constraints through progressive tightening,
particularly for base mobility, enabling feasible initialization and broad
solution exploration. The inner layer addresses non-convexities in each
outer-layer subproblem through sequential local linearization, converting the
original problem into tractable sequential linear programming (SLP). Testing
across multiple robot platforms demonstrates B*'s effectiveness. The framework
achieves solution optimality five orders of magnitude better than
sampling-based approaches while maintaining perfect success rates and reduced
computational overhead. Operating directly in configuration space, B* enables
simultaneous path planning with customizable optimization criteria. B* serves
as a crucial initialization tool that bridges the gap between theoretical
motion planning and practical deployment, where feasible trajectory existence
is fundamental.
comment: accepted for publication in the IEEE Robotics and Automation Letters
(RA-L)
♻ ☆ Afford-X: Generalizable and Slim Affordance Reasoning for Task-oriented Manipulation
Object affordance reasoning, the ability to infer object functionalities
based on physical properties, is fundamental for task-oriented planning and
activities in both humans and Artificial Intelligence (AI). This capability,
required for planning and executing daily activities in a task-oriented manner,
relies on commonsense knowledge of object physics and functionalities,
extending beyond simple object recognition. Current computational models for
affordance reasoning from perception lack generalizability, limiting their
applicability in novel scenarios. Meanwhile, comprehensive Large Language
Models (LLMs) with emerging reasoning capabilities are challenging to deploy on
local devices for task-oriented manipulations. Here, we introduce LVIS-Aff, a
large-scale dataset comprising 1,496 tasks and 119k images, designed to enhance
the generalizability of affordance reasoning from perception. Utilizing this
dataset, we develop Afford-X, an end-to-end trainable affordance reasoning
model that incorporates Verb Attention and Bi-Fusion modules to improve
multi-modal understanding. This model achieves up to a 12.1% performance
improvement over the best-reported results from non-LLM methods, while also
demonstrating a 1.2% enhancement compared to our previous conference paper.
Additionally, it maintains a compact 187M parameter size and infers nearly 50
times faster than the GPT-4V API. Our work demonstrates the potential for
efficient, generalizable affordance reasoning models that can be deployed on
local devices for task-oriented manipulations. We showcase Afford-X's
effectiveness in enabling task-oriented manipulations for robots across various
tasks and environments, underscoring its efficiency and broad implications for
advancing robotics and AI systems in real-world applications.
♻ ☆ GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
Learning open-vocabulary physical skills for simulated agents presents a
significant challenge in artificial intelligence. Current reinforcement
learning approaches face critical limitations: manually designed rewards lack
scalability across diverse tasks, while demonstration-based methods struggle to
generalize beyond their training distribution. We introduce GROVE, a
generalized reward framework that enables open-vocabulary physical skill
learning without manual engineering or task-specific demonstrations. Our key
insight is that Large Language Models(LLMs) and Vision Language Models(VLMs)
provide complementary guidance -- LLMs generate precise physical constraints
capturing task requirements, while VLMs evaluate motion semantics and
naturalness. Through an iterative design process, VLM-based feedback
continuously refines LLM-generated constraints, creating a self-improving
reward system. To bridge the domain gap between simulation and natural images,
we develop Pose2CLIP, a lightweight mapper that efficiently projects agent
poses directly into semantic feature space without computationally expensive
rendering. Extensive experiments across diverse embodiments and learning
paradigms demonstrate GROVE's effectiveness, achieving 22.2% higher motion
naturalness and 25.7% better task completion scores while training 8.4x faster
than previous methods. These results establish a new foundation for scalable
physical skill acquisition in simulated environments.
♻ ☆ Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation
Yuyang Li, Wenxin Du, Chang Yu, Puhao Li, Zihang Zhao, Tengyu Liu, Chenfanfu Jiang, Yixin Zhu, Siyuan Huang
Tactile sensing is crucial for achieving human-level robotic capabilities in
manipulation tasks. As a promising solution, Vision-Based Tactile Sensors
(VBTSs) offer high spatial resolution and cost-effectiveness, but present
unique challenges in robotics for their complex physical characteristics and
visual signal processing requirements. The lack of efficient and accurate
simulation tools for VBTSs has significantly limited the scale and scope of
tactile robotics research. We present Taccel, a high-performance simulation
platform that integrates IPC and ABD to model robots, tactile sensors, and
objects with both accuracy and unprecedented speed, achieving an 18-fold
acceleration over real-time across thousands of parallel environments. Unlike
previous simulators that operate at sub-real-time speeds with limited
parallelization, Taccel provides precise physics simulation and realistic
tactile signals while supporting flexible robot-sensor configurations through
user-friendly APIs. Through extensive validation in object recognition, robotic
grasping, and articulated object manipulation, we demonstrate precise
simulation and successful sim-to-real transfer. These capabilities position
Taccel as a powerful tool for scaling up tactile robotics research and
development, potentially transforming how robots interact with and understand
their physical environment.
♻ ☆ CTBC: Contact-Triggered Blind Climbing for Wheeled Bipedal Robots with Instruction Learning and Reinforcement Learning
In recent years, wheeled bipedal robots have gained increasing attention due
to their advantages in mobility, such as high-speed locomotion on flat terrain.
However, their performance on complex environments (e.g., staircases) remains
inferior to that of traditional legged robots. To overcome this limitation, we
propose a general contact-triggered blind climbing (CTBC) framework for wheeled
bipedal robots. Upon detecting wheel-obstacle contact, the robot triggers a
leg-lifting motion to overcome the obstacle. By leveraging a strongly-guided
feedforward trajectory, our method enables the robot to rapidly acquire agile
leg-lifting skills, significantly enhancing its capability to traverse
unstructured terrains. The approach has been experimentally validated and
successfully deployed on LimX Dynamics' wheeled bipedal robot, Tron1.
Real-world tests demonstrate that Tron1 can reliably climb obstacles well
beyond its wheel radius using only proprioceptive feedback.
♻ ☆ OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning
Yuecheng Liu, Dafeng Chi, Shiguang Wu, Zhanguang Zhang, Yuzheng Zhuang, Bowen Yang, He Zhu, Lingfeng Zhang, Pengwei Xie, David Gamaliel Arcos Bravo, Yingxue Zhang, Jianye Hao, Xingyue Quan
Recent advances in multimodal large language models (MLLMs) have opened new
opportunities for embodied intelligence, enabling multimodal understanding,
reasoning, and interaction, as well as continuous spatial decision-making.
Nevertheless, current MLLM-based embodied systems face two critical
limitations. First, Geometric Adaptability Gap: models trained solely on 2D
inputs or with hard-coded 3D geometry injection suffer from either insufficient
spatial information or restricted 2D generalization, leading to poor
adaptability across tasks with diverse spatial demands. Second, Embodiment
Constraint Gap: prior work often neglects the physical constraints and
capacities of real robots, resulting in task plans that are theoretically valid
but practically infeasible. To address these gaps, we introduce OmniEVA -- an
embodied versatile planner that enables advanced embodied reasoning and task
planning through two pivotal innovations: (1) a Task-Adaptive 3D Grounding
mechanism, which introduces a gated router to perform explicit selective
regulation of 3D fusion based on contextual requirements, enabling
context-aware 3D grounding for diverse embodied tasks. (2) an Embodiment-Aware
Reasoning framework that jointly incorporates task goals and embodiment
constraints into the reasoning loop, resulting in planning decisions that are
both goal-directed and executable. Extensive experimental results demonstrate
that OmniEVA not only achieves state-of-the-art general embodied reasoning
performance, but also exhibits a strong ability across a wide range of
downstream scenarios. Evaluations of a suite of proposed embodied benchmarks,
including both primitive and composite tasks, confirm its robust and versatile
planning capabilities. Project page: https://omnieva.github.io
♻ ☆ RGBlimp-Q: Robotic Gliding Blimp With Moving Mass Control Based on a Bird-Inspired Continuum Arm
Robotic blimps, as lighter-than-air aerial platforms, offer extended
operational duration and enhanced safety in human-robot interactions due to
their buoyant lift. However, achieving robust flight performance under
environmental airflow disturbances remains a critical challenge, thereby
limiting their broader deployment. Inspired by avian flight mechanics,
particularly the ability of birds to perch and stabilize in turbulent wind
conditions, this article introduces RGBlimp-Q -- a robotic gliding blimp
equipped with a bird-inspired continuum arm featuring a novel moving mass
actuation mechanism. This continuum arm enables flexible attitude regulation
through internal mass redistribution, significantly enhancing the system's
resilience to external disturbances. In addition, it facilitates aerial
manipulation by employing end-effector claws that interact with the environment
in a manner analogous to avian perching behavior. This article presents the
design, modeling, and prototyping of RGBlimp-Q, supported by comprehensive
experimental evaluation and comparative analysis. To the best of the authors'
knowledge, this represents the first interdisciplinary integration of continuum
mechanisms into a lighter-than-air robotic platform, where the continuum arm
simultaneously functions as both an actuation and manipulation module. This
design establishes a novel paradigm for robotic blimps, expanding their
applicability to complex and dynamic environments.
♻ ☆ Kinetostatics and Particle-Swarm Optimization of Vehicle-Mounted Underactuated Metamorphic Loading Manipulators
Fixed degree-of-freedom (DoF) loading mechanisms often suffer from excessive
actuators, complex control, and limited adaptability to dynamic tasks. This
study proposes an innovative mechanism of underactuated metamorphic loading
manipulators (UMLM), integrating a metamorphic arm with a passively adaptive
gripper. The metamorphic arm exploits geometric constraints, enabling the
topology reconfiguration and flexible motion trajectories without additional
actuators. The adaptive gripper, driven entirely by the arm, conforms to
diverse objects through passive compliance. A structural model is developed,
and a kinetostatics analysis is conducted to investigate isomorphic grasping
configurations. To optimize performance, Particle-Swarm Optimization (PSO) is
utilized to refine the gripper's dimensional parameters, ensuring robust
adaptability across various applications. Simulation results validate the
UMLM's easily implemented control strategy, operational versatility, and
effectiveness in grasping diverse objects in dynamic environments. This work
underscores the practical potential of underactuated metamorphic mechanisms in
applications requiring efficient and adaptable loading solutions. Beyond the
specific design, this generalized modeling and optimization framework extends
to a broader class of manipulators, offering a scalable approach to the
development of robotic systems that require efficiency, flexibility, and robust
performance.
comment: 50 pages, 19 figures
♻ ☆ Efficient Motion Sickness Assessment: Recreation of On-Road Driving on a Compact Test Track
Huseyin Harmankaya, Adrian Brietzke, Rebecca Pham-Xuan, Barys Shyrokau, Riender Happee, Georgios Papaioannou
The ability to engage in other activities during the ride is considered by
consumers as one of the key reasons for the adoption of automated vehicles.
However, engagement in non-driving activities will provoke occupants' motion
sickness, deteriorating their overall comfort and thereby risking acceptance of
automated driving. Therefore, it is critical to extend our understanding of
motion sickness and unravel the modulating factors that affect it through
experiments with participants. Currently, most experiments are conducted on
public roads (realistic but not reproducible) or test tracks (feasible with
prototype automated vehicles). This research study develops a method to design
an optimal path and speed reference to efficiently replicate on-road motion
sickness exposure on a small test track. The method uses model predictive
control to replicate the longitudinal and lateral accelerations collected from
on-road drives on a test track of 70 m by 175 m. A within-subject experiment
(47 participants) was conducted comparing the occupants' motion sickness
occurrence in test-track and on-road conditions, with the conditions being
cross-randomized. The results illustrate no difference and no effect of the
condition on the occurrence of the average motion sickness across the
participants. Meanwhile, there is an overall correspondence of individual
sickness levels between on-road and test-track. This paves the path for the
employment of our method for a simpler, safer and more replicable assessment of
motion sickness.
♻ ☆ Collision-Inclusive Manipulation Planning for Occluded Object Grasping via Compliant Robot Motions
Robotic manipulation research has investigated contact-rich problems and
strategies that require robots to intentionally collide with their environment,
to accomplish tasks that cannot be handled by traditional collision-free
solutions. By enabling compliant robot motions, collisions between the robot
and its environment become more tolerable and can thus be exploited, but more
physical uncertainties are introduced. To address contact-rich problems such as
occluded object grasping while handling the involved uncertainties, we propose
a collision-inclusive planning framework that can transition the robot to a
desired task configuration via roughly modeled collisions absorbed by Cartesian
impedance control. By strategically exploiting the environmental constraints
and exploring inside a manipulation funnel formed by task repetitions, our
framework can effectively reduce physical and perception uncertainties. With
real-world evaluations on both single-arm and dual-arm setups, we show that our
framework is able to efficiently address various realistic occluded grasping
problems where a feasible grasp does not initially exist.
comment: This work has been submitted to the IEEE for possible publication
♻ ☆ Spatiotemporal Tubes for Temporal Reach-Avoid-Stay Tasks in Unknown Systems
The paper considers the controller synthesis problem for general MIMO systems
with unknown dynamics, aiming to fulfill the temporal reach-avoid-stay task,
where the unsafe regions are time-dependent, and the target must be reached
within a specified time frame. The primary aim of the paper is to construct the
spatiotemporal tube (STT) using a sampling-based approach and thereby devise a
closed-form approximation-free control strategy to ensure that system
trajectory reaches the target set while avoiding time-dependent unsafe sets.
The proposed scheme utilizes a novel method involving STTs to provide
controllers that guarantee both system safety and reachability. In our
sampling-based framework, we translate the requirements of STTs into a Robust
optimization program (ROP). To address the infeasibility of ROP caused by
infinite constraints, we utilize the sampling-based Scenario optimization
program (SOP). Subsequently, we solve the SOP to generate the tube and
closed-form controller for an unknown system, ensuring the temporal
reach-avoid-stay specification. Finally, the effectiveness of the proposed
approach is demonstrated through three case studies: an omnidirectional robot,
a SCARA manipulator, and a magnetic levitation system.
comment: IEEE Transactions on Automatic Control (2025)
♻ ☆ Object-Centric Kinodynamic Planning for Nonprehensile Robot Rearrangement Manipulation
Nonprehensile actions such as pushing are crucial for addressing multi-object
rearrangement problems. Many traditional methods generate robot-centric
actions, which differ from intuitive human strategies and are typically
inefficient. To this end, we adopt an object-centric planning paradigm and
propose a unified framework for addressing a range of large-scale,
physics-intensive nonprehensile rearrangement problems challenged by modeling
inaccuracies and real-world uncertainties. By assuming each object can actively
move without being driven by robot interactions, our planner first computes
desired object motions, which are then realized through robot actions generated
online via a closed-loop pushing strategy. Through extensive experiments and in
comparison with state-of-the-art baselines in both simulation and on a physical
robot, we show that our object-centric planning framework can generate more
intuitive and task-effective robot actions with significantly improved
efficiency. In addition, we propose a benchmarking protocol to standardize and
facilitate future research in nonprehensile rearrangement.
♻ ☆ Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control
Recent research has highlighted the powerful capabilities of imitation
learning in robotics. Leveraging generative models, particularly diffusion
models, these approaches offer notable advantages such as strong multi-task
generalization, effective language conditioning, and high sample efficiency.
While their application has been successful in manipulation tasks, their use in
legged locomotion remains relatively underexplored, mainly due to compounding
errors that affect stability and difficulties in task transition under limited
data. Online reinforcement learning (RL) has demonstrated promising results in
legged robot control in the past years, providing valuable insights to address
these challenges. In this work, we propose DMLoco, a diffusion-based framework
for quadruped robots that integrates multi-task pretraining with online PPO
finetuning to enable language-conditioned control and robust task transitions.
Our approach first pretrains the policy on a diverse multi-task dataset using
diffusion models, enabling language-guided execution of various skills. Then,
it finetunes the policy in simulation to ensure robustness and stable task
transition during real-world deployment. By utilizing Denoising Diffusion
Implicit Models (DDIM) for efficient sampling and TensorRT for optimized
deployment, our policy runs onboard at 50Hz, offering a scalable and efficient
solution for adaptive, language-guided locomotion on resource-constrained
robotic platforms.
♻ ☆ Towards Developing Socially Compliant Automated Vehicles: Advances, Expert Insights, and A Conceptual Framework
Automated Vehicles (AVs) hold promise for revolutionizing transportation by
improving road safety, traffic efficiency, and overall mobility. Despite the
steady advancement in high-level AVs in recent years, the transition to full
automation entails a period of mixed traffic, where AVs of varying automation
levels coexist with human-driven vehicles (HDVs). Making AVs socially compliant
and understood by human drivers is expected to improve the safety and
efficiency of mixed traffic. Thus, ensuring AVs' compatibility with HDVs and
social acceptance is crucial for their successful and seamless integration into
mixed traffic. However, research in this critical area of developing Socially
Compliant AVs (SCAVs) remains sparse. This study carries out the first
comprehensive scoping review to assess the current state of the art in
developing SCAVs, identifying key concepts, methodological approaches, and
research gaps. An informal expert interview was also conducted to discuss the
literature review results and identify critical research gaps and expectations
towards SCAVs. Based on the scoping review and expert interview input, a
conceptual framework is proposed for the development of SCAVs. The conceptual
framework is evaluated using an online survey targeting researchers,
technicians, policymakers, and other relevant professionals worldwide. The
survey results provide valuable validation and insights, affirming the
significance of the proposed conceptual framework in tackling the challenges of
integrating AVs into mixed-traffic environments. Additionally, future research
perspectives and suggestions are discussed, contributing to the research and
development agenda of SCAVs.
comment: 23 pages, 13 figures, accepted by the Journal of Communications in
Transportation Research
♻ ☆ Agentic Vehicles for Human-Centered Mobility Systems
Autonomy, from the Greek autos (self) and nomos (law), refers to the capacity
to operate according to internal rules without external control. Autonomous
vehicles (AuVs) are therefore understood as systems that perceive their
environment and execute pre-programmed tasks independently of external input,
consistent with the SAE levels of automated driving. Yet recent research and
real-world deployments have begun to showcase vehicles that exhibit behaviors
outside the scope of this definition. These include natural language
interaction with humans, goal adaptation, contextual reasoning, external tool
use, and the handling of unforeseen ethical dilemmas, enabled in part by
multimodal large language models (LLMs). These developments highlight not only
a gap between technical autonomy and the broader cognitive and social
capacities required for human-centered mobility, but also the emergence of a
form of vehicle intelligence that currently lacks a clear designation. To
address this gap, the paper introduces the concept of agentic vehicles (AgVs):
vehicles that integrate agentic AI systems to reason, adapt, and interact
within complex environments. It synthesizes recent advances in agentic systems
and suggests how AgVs can complement and even reshape conventional autonomy to
ensure mobility services are aligned with user and societal needs. The paper
concludes by outlining key challenges in the development and governance of AgVs
and their potential role in shaping future agentic transportation systems.