Robotics 25
☆ Distilling Multi-modal Large Language Models for Autonomous Driving
Deepti Hegde, Rajeev Yasarla, Hong Cai, Shizhong Han, Apratim Bhattacharyya, Shweta Mahajan, Litian Liu, Risheek Garrepalli, Vishal M. Patel, Fatih Porikli
Autonomous driving demands safe motion planning, especially in critical
"long-tail" scenarios. Recent end-to-end autonomous driving systems leverage
large language models (LLMs) as planners to improve generalizability to rare
events. However, using LLMs at test time introduces high computational costs.
To address this, we propose DiMA, an end-to-end autonomous driving system that
maintains the efficiency of an LLM-free (or vision-based) planner while
leveraging the world knowledge of an LLM. DiMA distills the information from a
multi-modal LLM to a vision-based end-to-end planner through a set of specially
designed surrogate tasks. Under a joint training strategy, a scene encoder
common to both networks produces structured representations that are
semantically grounded as well as aligned to the final planning objective.
Notably, the LLM is optional at inference, enabling robust planning without
compromising on efficiency. Training with DiMA results in a 37% reduction in
the L2 trajectory error and an 80% reduction in the collision rate of the
vision-based planner, as well as a 44% trajectory error reduction in longtail
scenarios. DiMA also achieves state-of-the-art performance on the nuScenes
planning benchmark.
☆ FAST: Efficient Action Tokenization for Vision-Language-Action Models
Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, Sergey Levine
Autoregressive sequence models, such as Transformer-based vision-language
action (VLA) policies, can be tremendously effective for capturing complex and
generalizable robotic behaviors. However, such models require us to choose a
tokenization of our continuous action signals, which determines how the
discrete symbols predicted by the model map to continuous robot actions. We
find that current approaches for robot action tokenization, based on simple
per-dimension, per-timestep binning schemes, typically perform poorly when
learning dexterous skills from high-frequency robot data. To address this
challenge, we propose a new compression-based tokenization scheme for robot
actions, based on the discrete cosine transform. Our tokenization approach,
Frequency-space Action Sequence Tokenization (FAST), enables us to train
autoregressive VLAs for highly dexterous and high-frequency tasks where
standard discretization methods fail completely. Based on FAST, we release
FAST+, a universal robot action tokenizer, trained on 1M real robot action
trajectories. It can be used as a black-box tokenizer for a wide range of robot
action sequences, with diverse action spaces and control frequencies. Finally,
we show that, when combined with the pi0 VLA, our method can scale to training
on 10k hours of robot data and match the performance of diffusion VLAs, while
reducing training time by up to 5x.
comment: Website: https://www.pi.website/research/fast
☆ FLOL: Fast Baselines for Real-World Low-Light Enhancement
Low-Light Image Enhancement (LLIE) is a key task in computational photography
and imaging. The problem of enhancing images captured during night or in dark
environments has been well-studied in the image signal processing literature.
However, current deep learning-based solutions struggle with efficiency and
robustness in real-world scenarios (e.g. scenes with noise, saturated pixels,
bad illumination). We propose a lightweight neural network that combines image
processing in the frequency and spatial domains. Our method, FLOL+, is one of
the fastest models for this task, achieving state-of-the-art results on popular
real scenes datasets such as LOL and LSRW. Moreover, we are able to process
1080p images under 12ms. Code and models at https://github.com/cidautai/FLOL
comment: Technical Report
☆ CoNav Chair: Design of a ROS-based Smart Wheelchair for Shared Control Navigation in the Built Environment
With the number of people with disabilities (PWD) increasing worldwide each
year, the demand for mobility support to enable independent living and social
integration is also growing. Wheelchairs commonly support the mobility of PWD
in both indoor and outdoor environments. However, current powered wheelchairs
(PWC) often fail to meet the needs of PWD, who may find it difficult to operate
them. Furthermore, existing research on robotic wheelchairs typically focuses
either on full autonomy or enhanced manual control, which can lead to reduced
efficiency and user trust. To address these issues, this paper proposes a Robot
Operating System (ROS)-based smart wheelchair, called CoNav Chair, that
incorporates a shared control navigation algorithm and obstacle avoidance to
support PWD while fostering efficiency and trust between the robot and the
user. Our design consists of hardware and software components. Experimental
results conducted in a typical indoor social environment demonstrate the
performance and effectiveness of the smart wheelchair hardware and software
design. This integrated design promotes trust and autonomy, which are crucial
for the acceptance of assistive mobility technologies in the built environment.
comment: 8 pages, 9 figures
☆ Model Predictive Path Integral Docking of Fully Actuated Surface Vessel
Autonomous docking remains one of the most challenging maneuvers in marine
robotics, requiring precise control and robust perception in confined spaces.
This paper presents a novel approach integrating Model Predictive Path
Integral(MPPI) control with real-time LiDAR-based dock detection for autonomous
surface vessel docking. Our framework uniquely combines probabilistic
trajectory optimization with a multiobjective cost function that simultaneously
considers docking precision, safety constraints, and motion efficiency. The
MPPI controller generates optimal trajectories by intelligently sampling
control sequences and evaluating their costs based on dynamic clearance
requirements, orientation alignment, and target position objectives. We
introduce an adaptive dock detection pipeline that processes LiDAR point clouds
to extract critical geometric features, enabling real-time updates of docking
parameters. The proposed method is extensively validated in a physics-based
simulation environment that incorporates realistic sensor noise, vessel
dynamics, and environmental constraints. Results demonstrate successful docking
from various initial positions while maintaining safe clearances and smooth
motion characteristics.
comment: 6 pages, 6 figures, 1 table, UT2025 Conference, IEEE International
Symposium on Underwater Technology 2025
☆ Monte Carlo Tree Search with Velocity Obstacles for safe and efficient motion planning in dynamic environments
Online motion planning is a challenging problem for intelligent robots moving
in dense environments with dynamic obstacles, e.g., crowds. In this work, we
propose a novel approach for optimal and safe online motion planning with
minimal information about dynamic obstacles. Specifically, our approach
requires only the current position of the obstacles and their maximum speed,
but it does not need any information about their exact trajectories or dynamic
model. The proposed methodology combines Monte Carlo Tree Search (MCTS), for
online optimal planning via model simulations, with Velocity Obstacles (VO),
for obstacle avoidance. We perform experiments in a cluttered simulated
environment with walls, and up to 40 dynamic obstacles moving with random
velocities and directions. With an ablation study, we show the key contribution
of VO in scaling up the efficiency of MCTS, selecting the safest and most
rewarding actions in the tree of simulations. Moreover, we show the superiority
of our methodology with respect to state-of-the-art planners, including
Non-linear Model Predictive Control (NMPC), in terms of improved collision
rate, computational and task performance.
☆ Mesh2SLAM in VR: A Fast Geometry-Based SLAM Framework for Rapid Prototyping in Virtual Reality Applications
SLAM is a foundational technique with broad applications in robotics and
AR/VR. SLAM simulations evaluate new concepts, but testing on
resource-constrained devices, such as VR HMDs, faces challenges: high
computational cost and restricted sensor data access. This work proposes a
sparse framework using mesh geometry projections as features, which improves
efficiency and circumvents direct sensor data access, advancing SLAM research
as we demonstrate in VR and through numerical evaluation.
☆ Comparison of Various SLAM Systems for Mobile Robot in an Indoor Environment
This article presents a comparative analysis of a mobile robot trajectories
computed by various ROS-based SLAM systems. For this reason we developed a
prototype of a mobile robot with common sensors: 2D lidar, a monocular and ZED
stereo cameras. Then we conducted experiments in a typical office environment
and collected data from all sensors, running all tested SLAM systems based on
the acquired dataset. We studied the following SLAM systems: (a) 2D
lidar-based: GMapping, Hector SLAM, Cartographer; (b) monocular camera-based:
Large Scale Direct monocular SLAM (LSD SLAM), ORB SLAM, Direct Sparse Odometry
(DSO); and (c) stereo camera-based: ZEDfu, Real-Time Appearance-Based Mapping
(RTAB map), ORB SLAM, Stereo Parallel Tracking and Mapping (S-PTAM). Since all
SLAM methods were tested on the same dataset we compared results for different
SLAM systems with appropriate metrics, demonstrating encouraging results for
lidar-based Cartographer SLAM, Monocular ORB SLAM and Stereo RTAB Map methods.
comment: 6 pages, 6 figures
☆ Sensorimotor Control Strategies for Tactile Robotics
How are robots becoming smarter at interacting with their surroundings?
Recent advances have reshaped how robots use tactile sensing to perceive and
engage with the world. Tactile sensing is a game-changer, allowing robots to
embed sensorimotor control strategies to interact with complex environments and
skillfully handle heterogeneous objects. Such control frameworks plan
contact-driven motions while staying responsive to sudden changes. We review
the latest methods for building perception and control systems in tactile
robotics while offering practical guidelines for their design and
implementation. We also address key challenges to shape the future of
intelligent robots.
comment: 39 pages, 8 figures, 1 table
☆ Real-Time Generation of Near-Minimum-Energy Trajectories via Constraint-Informed Residual Learning
Industrial robotics demands significant energy to operate, making
energy-reduction methodologies increasingly important. Strategies for planning
minimum-energy trajectories typically involve solving nonlinear optimal control
problems (OCPs), which rarely cope with real-time requirements. In this paper,
we propose a paradigm for generating near minimum-energy trajectories for
manipulators by learning from optimal solutions. Our paradigm leverages a
residual learning approach, which embeds boundary conditions while focusing on
learning only the adjustments needed to steer a standard solution to an optimal
one. Compared to a computationally expensive OCP-based planner, our paradigm
achieves 87.3% of the performance near the training dataset and 50.8% far from
the dataset, while being two to three orders of magnitude faster.
☆ Path Planning for a UAV Swarm Using Formation Teaching-Learning-Based Optimization
This work addresses the path planning problem for a group of unmanned aerial
vehicles (UAVs) to maintain a desired formation during operation. Our approach
formulates the problem as an optimization task by defining a set of fitness
functions that not only ensure the formation but also include constraints for
optimal and safe UAV operation. To optimize the fitness function and obtain a
suboptimal path, we employ the teaching-learning-based optimization algorithm
and then further enhance it with mechanisms such as mutation, elite strategy,
and multi-subject combination. A number of simulations and experiments have
been conducted to evaluate the proposed method. The results demonstrate that
the algorithm successfully generates valid paths for the UAVs to fly in a
triangular formation for an inspection task.
comment: in Proceedings of the 2025 International Conference on Energy,
Infrastructure and Environmental Research (EIER2025)
☆ Robust UAV Path Planning with Obstacle Avoidance for Emergency Rescue
The unmanned aerial vehicles (UAVs) are efficient tools for diverse tasks
such as electronic reconnaissance, agricultural operations and disaster relief.
In the complex three-dimensional (3D) environments, the path planning with
obstacle avoidance for UAVs is a significant issue for security assurance. In
this paper, we construct a comprehensive 3D scenario with obstacles and no-fly
zones for dynamic UAV trajectory. Moreover, a novel artificial potential field
algorithm coupled with simulated annealing (APF-SA) is proposed to tackle the
robust path planning problem. APF-SA modifies the attractive and repulsive
potential functions and leverages simulated annealing to escape local minimum
and converge to globally optimal solutions. Simulation results demonstrate that
the effectiveness of APF-SA, enabling efficient autonomous path planning for
UAVs with obstacle avoidance.
☆ RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects
As robotic technology rapidly develops, robots are being employed in an
increasing number of fields. However, due to the complexity of deployment
environments or the prevalence of ambiguous-condition objects, the practical
application of robotics still faces many challenges, leading to frequent
errors. Traditional methods and some LLM-based approaches, although improved,
still require substantial human intervention and struggle with autonomous error
correction in complex scenarios.In this work, we propose RoboReflect, a novel
framework leveraging large vision-language models (LVLMs) to enable
self-reflection and autonomous error correction in robotic grasping tasks.
RoboReflect allows robots to automatically adjust their strategies based on
unsuccessful attempts until successful execution is achieved.The corrected
strategies are saved in a memory for future task reference.We evaluate
RoboReflect through extensive testing on eight common objects prone to
ambiguous conditions of three categories.Our results demonstrate that
RoboReflect not only outperforms existing grasp pose estimation methods like
AnyGrasp and high-level action planning techniques using GPT-4V but also
significantly enhances the robot's ability to adapt and correct errors
independently. These findings underscore the critical importance of autonomous
selfreflection in robotic systems while effectively addressing the challenges
posed by ambiguous environments.
☆ Interoceptive Robots for Convergent Shared Control in Collaborative Construction Work
Building autonomous mobile robots (AMRs) with optimized efficiency and
adaptive capabilities-able to respond to changing task demands and dynamic
environments-is a strongly desired goal for advancing construction robotics.
Such robots can play a critical role in enabling automation, reducing
operational carbon footprints, and supporting modular construction processes.
Inspired by the adaptive autonomy of living organisms, we introduce
interoception, which centers on the robot's internal state representation, as a
foundation for developing self-reflection and conscious learning to enable
continual learning and adaptability in robotic agents. In this paper, we
factorize internal state variables and mathematical properties as "cognitive
dissonance" in shared control paradigms, where human interventions occasionally
occur. We offer a new perspective on how interoception can help build adaptive
motion planning in AMRs by integrating the legacy of heuristic costs from
grid/graph-based algorithms with recent advances in neuroscience and
reinforcement learning. Declarative and procedural knowledge extracted from
human semantic inputs is encoded into a hypergraph model that overlaps with the
spatial configuration of onsite layout for path planning. In addition, we
design a velocity-replay module using an encoder-decoder architecture with
few-shot learning to enable robots to replicate velocity profiles in
contextualized scenarios for multi-robot synchronization and handover
collaboration. These "cached" knowledge representations are demonstrated in
simulated environments for multi-robot motion planning and stacking tasks. The
insights from this study pave the way toward artificial general intelligence in
AMRs, fostering their progression from complexity to competence in construction
automation.
☆ ThinTact:Thin Vision-Based Tactile Sensor by Lensless Imaging
Vision-based tactile sensors have drawn increasing interest in the robotics
community. However, traditional lens-based designs impose minimum thickness
constraints on these sensors, limiting their applicability in space-restricted
settings. In this paper, we propose ThinTact, a novel lensless vision-based
tactile sensor with a sensing field of over 200 mm2 and a thickness of less
than 10 mm.ThinTact utilizes the mask-based lensless imaging technique to map
the contact information to CMOS signals. To ensure real-time tactile sensing,
we propose a real-time lensless reconstruction algorithm that leverages a
frequency-spatial-domain joint filter based on discrete cosine transform (DCT).
This algorithm achieves computation significantly faster than existing
optimization-based methods. Additionally, to improve the sensing quality, we
develop a mask optimization method based on the generic algorithm and the
corresponding system matrix calibration algorithm.We evaluate the performance
of our proposed lensless reconstruction and tactile sensing through qualitative
and quantitative experiments. Furthermore, we demonstrate ThinTact's practical
applicability in diverse applications, including texture recognition and
contact-rich object manipulation. The paper will appear in the IEEE
Transactions on Robotics: https://ieeexplore.ieee.org/document/10842357. Video:
https://youtu.be/YrOO9BDMAHo
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted.
Permission from IEEE must be obtained for all other uses, in any current or
future media, including reprinting/republishing this material for advertising
or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works
☆ Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
The construction industry has long explored robotics and computer vision, yet
their deployment on construction sites remains very limited. These technologies
have the potential to revolutionize traditional workflows by enhancing
accuracy, efficiency, and safety in construction management. Ground robots
equipped with advanced vision systems could automate tasks such as monitoring
mechanical, electrical, and plumbing (MEP) systems. The present research
evaluates the applicability of open-vocabulary vision-language models compared
to fine-tuned, lightweight, closed-set object detectors for detecting MEP
components using a mobile ground robotic platform. A dataset collected with
cameras mounted on a ground robot was manually annotated and analyzed to
compare model performance. The results demonstrate that, despite the
versatility of vision-language models, fine-tuned lightweight models still
largely outperform them in specialized environments and for domain-specific
tasks.
comment: 4 pages, 3 figures
♻ ☆ Global SLAM in Visual-Inertial Systems with 5G Time-of-Arrival Integration
This paper presents a novel approach that integrates 5G Time of Arrival (ToA)
measurements into ORB-SLAM3 to enable global localization and enhance mapping
capabilities for indoor drone navigation. We extend ORB-SLAM3's optimization
pipeline to jointly process ToA data from 5G base stations alongside visual and
inertial measurements while estimating system biases. This integration
transforms the inherently local SLAM estimates into globally referenced
trajectories and effectively resolves scale ambiguity in monocular
configurations. Our method is evaluated using five real-world indoor datasets
collected with RGB-D cameras and inertial measurement units (IMUs),
complemented by simulated 5G ToA measurements at 28 GHz and 78 GHz frequencies
using MATLAB and QuaDRiGa. Extensive experiments across four SLAM
configurations (RGB-D, RGB-D-Inertial, Monocular, and Monocular-Inertial)
demonstrate that ToA integration enables consistent global positioning across
all modes while significantly improving local accuracy in minimal sensor
setups. Notably, ToA-enhanced monocular SLAM achieves superior local accuracy
(6.3 cm average) compared to the RGB-D baseline (11.5 cm), and enables reliable
operation of monocular-inertial SLAM in scenarios where the baseline system
fails completely. While ToA integration offers limited local accuracy
improvements for sensor-rich configurations like RGB-D SLAM, it consistently
enables robust global localization.
♻ ☆ AeroHaptix: A Wearable Vibrotactile Feedback System for Enhancing Collision Avoidance in UAV Teleoperation
Bingjian Huang, Zhecheng Wang, Qilong Cheng, Siyi Ren, Hanfeng Cai, Antonio Alvarez Valdivia, Karthik Mahadevan, Daniel Wigdor
Haptic feedback enhances collision avoidance by providing directional
obstacle information to operators during unmanned aerial vehicle (UAV)
teleoperation. However, such feedback is often rendered via haptic joysticks,
which are unfamiliar to UAV operators and limited to single-direction force
feedback. Additionally, the direct coupling between the input device and the
feedback method diminishes operators' sense of control and induces oscillatory
movements. To overcome these limitations, we propose AeroHaptix, a wearable
haptic feedback system that uses spatial vibrations to simultaneously
communicate multiple obstacle directions to operators, without interfering with
their input control. The layout of vibrotactile actuators was optimized via a
perceptual study to eliminate perceptual biases and achieve uniform spatial
coverage. A novel rendering algorithm, MultiCBF, extended control barrier
functions to support multi-directional feedback. Our system evaluation showed
that compared to a no-feedback condition, AeroHaptix effectively reduced the
number of collisions and input disagreement. Furthermore, operators reported
that AeroHaptix was more helpful than force feedback, with improved situational
awareness and comparable workload.
♻ ☆ Learning Constraint Network from Demonstrations via Positive-Unlabeled Learning with Memory Replay
Planning for a wide range of real-world tasks necessitates to know and write
all constraints. However, instances exist where these constraints are either
unknown or challenging to specify accurately. A possible solution is to infer
the unknown constraints from expert demonstration. The majority of prior works
limit themselves to learning simple linear constraints, or require strong
knowledge of the true constraint parameterization or environmental model. To
mitigate these problems, this paper presents a positive-unlabeled (PU) learning
approach to infer a continuous, arbitrary and possibly nonlinear, constraint
from demonstration. From a PU learning view, We treat all data in
demonstrations as positive (feasible) data, and learn a (sub)-optimal policy to
generate high-reward-winning but potentially infeasible trajectories, which
serve as unlabeled data containing both feasible and infeasible states. Under
an assumption on data distribution, a feasible-infeasible classifier (i.e.,
constraint model) is learned from the two datasets through a postprocessing PU
learning technique. The entire method employs an iterative framework
alternating between updating the policy, which generates and selects
higher-reward policies, and updating the constraint model. Additionally, a
memory buffer is introduced to record and reuse samples from previous
iterations to prevent forgetting. The effectiveness of the proposed method is
validated in two Mujoco environments, successfully inferring continuous
nonlinear constraints and outperforming a baseline method in terms of
constraint accuracy and policy safety.
♻ ☆ Positive-Unlabeled Constraint Learning for Inferring Nonlinear Continuous Constraints Functions from Expert Demonstrations
Planning for diverse real-world robotic tasks necessitates to know and write
all constraints. However, instances exist where these constraints are either
unknown or challenging to specify accurately. A possible solution is to infer
the unknown constraints from expert demonstration. This paper presents a novel
two-step Positive-Unlabeled Constraint Learning (PUCL) algorithm to infer a
continuous constraint function from demonstrations, without requiring prior
knowledge of the true constraint parameterization or environmental model as
existing works. We treat all data in demonstrations as positive (feasible)
data, and learn a control policy to generate potentially infeasible
trajectories, which serve as unlabeled data. The proposed two-step learning
framework first identifies reliable infeasible data using a distance metric,
and secondly learns a binary feasibility classifier (i.e., constraint function)
from the feasible demonstrations and reliable infeasible data. The proposed
method is flexible to learn complex-shaped constraint boundary and will not
mistakenly classify demonstrations as infeasible as previous methods. The
effectiveness of the proposed method is verified in four constrained
environments, using a networked policy or a dynamical system policy. It
successfully infers the continuous nonlinear constraints and outperforms other
baseline methods in terms of constraint accuracy and policy safety. This work
has been published in IEEE Robotics and Automation Letters (RA-L). Please refer
to the final version at https://doi.org/10.1109/LRA.2024.3522756
♻ ☆ Humanoid Robot RHP Friends: Seamless Combination of Autonomous and Teleoperated Tasks in a Nursing Context
Mehdi Benallegue, Guillaume Lorthioir, Antonin Dallard, Rafael Cisneros-Limón, Iori Kumagai, Mitsuharu Morisawa, Hiroshi Kaminaga, Masaki Murooka, Antoine Andre, Pierre Gergondet, Kenji Kaneko, Guillaume Caron, Fumio Kanehiro, Abderrahmane Kheddar, Soh Yukizaki, Junichi Karasuyama, Junichi Murakami, Masayuki Kamon
This paper describes RHP Friends, a social humanoid robot developed to enable
assistive robotic deployments in human-coexisting environments. As a use-case
application, we present its potential use in nursing by extending its
capabilities to operate human devices and tools according to the task and by
enabling remote assistance operations. To meet a wide variety of tasks and
situations in environments designed by and for humans, we developed a system
that seamlessly integrates the slim and lightweight robot and several
technologies: locomanipulation, multi-contact motion, teleoperation, and object
detection and tracking. We demonstrated the system's usage in a nursing
application. The robot efficiently performed the daily task of patient transfer
and a non-routine task, represented by a request to operate a circuit breaker.
This demonstration, held at the 2023 International Robot Exhibition (IREX),
conducted three times a day over three days.
comment: IEEE Robotics and Automation Magazine, In press
♻ ☆ Equivariant IMU Preintegration with Biases: a Galilean Group Approach
This letter proposes a new approach for Inertial Measurement Unit (IMU)
preintegration, a fundamental building block that can be leveraged in different
optimization-based Inertial Navigation System (INS) localization solutions.
Inspired by recent advances in equivariant theory applied to biased INSs, we
derive a discrete-time formulation of the IMU preintegration on
${\mathbf{Gal}(3) \ltimes \mathfrak{gal}(3)}$, the left-trivialization of the
tangent group of the Galilean group $\mathbf{Gal}(3)$. We define a novel
preintegration error that geometrically couples the navigation states and the
bias leading to lower linearization error. Our method improves in consistency
compared to existing preintegration approaches which treat IMU biases as a
separate state-space. Extensive validation against state-of-the-art methods,
both in simulation and with real-world IMU data, implementation in the Lie++
library, and open-source code are provided.
♻ ☆ PO-GVINS: Tightly Coupled GNSS-Visual-Inertial Integration with Pose-Only Representation
Accurate and reliable positioning is crucial for perception, decision-making,
and other high-level applications in autonomous driving, unmanned aerial
vehicles, and intelligent robots. Given the inherent limitations of standalone
sensors, integrating heterogeneous sensors with complementary capabilities is
one of the most effective approaches to achieving this goal. In this paper, we
propose a filtering-based, tightly coupled global navigation satellite system
(GNSS)-visual-inertial positioning framework with a pose-only formulation
applied to the visual-inertial system (VINS), termed PO-GVINS. Specifically,
multiple-view imaging used in current VINS requires a priori of 3D feature,
then jointly estimate camera poses and 3D feature position, which inevitably
introduces linearization error of the feature as well as facing dimensional
explosion. However, the pose-only (PO) formulation, which is demonstrated to be
equivalent to the multiple-view imaging and has been applied in visual
reconstruction, represent feature depth using two camera poses and thus 3D
feature position is removed from state vector avoiding aforementioned
difficulties. Inspired by this, we first apply PO formulation in our VINS,
i.e., PO-VINS. GNSS raw measurements are then incorporated with integer
ambiguity resolved to achieve accurate and drift-free estimation. Extensive
experiments demonstrate that the proposed PO-VINS significantly outperforms the
multi-state constrained Kalman filter (MSCKF). By incorporating GNSS
measurements, PO-GVINS achieves accurate, drift-free state estimation, making
it a robust solution for positioning in challenging environments.
♻ ☆ Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention
Recent reinforcement learning (RL) algorithms have demonstrated impressive
results in simulated driving environments. However, autonomous vehicles trained
in simulation often struggle to work well in the real world due to the fidelity
gap between simulated and real-world environments. While directly training
real-world autonomous vehicles with RL algorithms is a promising approach to
bypass the fidelity gap problem, it presents several challenges. One critical
yet often overlooked challenge is the need to reset a driving environment
between every episode. This reset process demands significant human
intervention, leading to poor training efficiency in the real world. In this
paper, we introduce a novel autonomous algorithm that enables off-the-shelf RL
algorithms to train autonomous vehicles with minimal human intervention. Our
algorithm reduces unnecessary human intervention by aborting episodes to
prevent unsafe states and identifying informative initial states for subsequent
episodes. The key idea behind identifying informative initial states is to
estimate the expected amount of information that can be obtained from
under-explored but reachable states. Our algorithm also revisits rule-based
autonomous driving algorithms and highlights their benefits in safely returning
an autonomous vehicle to initial states. To evaluate how much human
intervention is required during training, we implement challenging urban
driving tasks that require an autonomous vehicle to reset to initial states on
its own. The experimental results show that our autonomous algorithm is
task-agnostic and achieves competitive driving performance with much less human
intervention than baselines.
comment: 8 pages, 6 figures, 2 tables, conference
♻ ☆ Gameplay Filters: Robust Zero-Shot Safety through Adversarial Imagination
Despite the impressive recent advances in learning-based robot control,
ensuring robustness to out-of-distribution conditions remains an open
challenge. Safety filters can, in principle, keep arbitrary control policies
from incurring catastrophic failures by overriding unsafe actions, but existing
solutions for complex (e.g., legged) robot dynamics do not span the full motion
envelope and instead rely on local, reduced-order models. These filters tend to
overly restrict agility and can still fail when perturbed away from nominal
conditions. This paper presents the gameplay filter, a new class of predictive
safety filter that continually plays out hypothetical matches between its
simulation-trained safety strategy and a virtual adversary co-trained to invoke
worst-case events and sim-to-real error, and precludes actions that would cause
failures down the line. We demonstrate the scalability and robustness of the
approach with a first-of-its-kind full-order safety filter for (36-D)
quadrupedal dynamics. Physical experiments on two different quadruped platforms
demonstrate the superior zero-shot effectiveness of the gameplay filter under
large perturbations such as tugging and unmodeled terrain. Experiment videos
and open-source software are available online:
https://saferobotics.org/research/gameplay-filter