Fast-UMI
A Scalable and Hardware-Independent Universal Manipulation Interface
Physical prototypes of Fast-UMI system
Project Overview
Collecting real-world manipulation trajectory data involving robotic arms is essential for developing general-purpose action policies in robotic manipulation, yet such data remains scarce. Existing methods face limitations such as high costs, labor intensity, hardware dependencies, and complex setup requirements involving SLAM algorithms. In this work, we introduce Fast-UMI, an interface-mediated manipulation system comprising two key components: a handheld device operated by humans for data collection and a robot-mounted device used during policy inference. Our approach employs a decoupled design compatible with a wide range of grippers while maintaining consistent observation perspectives, allowing models trained on handheld-collected data to be directly applied to real robots. By directly obtaining the end-effector pose using existing commercial hardware products, we eliminate the need for complex SLAM deployment and calibration, streamlining data processing. Fast-UMI provides supporting software tools for efficient robot learning data collection and conversion, facilitating rapid, plug-and-play functionality. This system offers an efficient and user-friendly tool for robotic learning data acquisition.
Contributions
- FastUMI is adaptable to most robotic arms, as we have conducted a series of hardware and software designs.
- FastUMI employs the T265 for visual odometry, which can, to some extent, address visual occlusion issues—critical for tasks involving articulated objects.
- FastUMI supports modular sensor designs, and versions 2.0 and 3.0 are set to launch soon.
- FastUMI integrates with various imitation learning algorithms, supporting predictions of both the relative position of the robotic arm's end-effector and the joint states.
- We have open-sourced a dataset containing 10,000 samples across 20 tasks. In the future, we will progressively release datasets at the 100,000-sample scale.
Dataset Visualization
We have collected 10,000 data samples across 20 tasks and have uploaded the dataset to Hugging Face.Demos
A naive ACT algorithm is employed here to demonstrate the effectiveness of the 10K training samples we collected, as well as the overall performance of our data collection system.
Note: The performance of the robot in the video is related to our adopted ACT algorithm, and has no much connection to our FastUMI system.
RealSense T265 Trajectory Accuracy
We evaluated the RealSense T265 trajectory accuracy compared to our motion capture (MoCap) ground truth data.Trajectory ID | Max Error (mm) | Mean Error (mm) | Min Error (mm) |
---|---|---|---|
Traj 1 | 15.669 | 9.710 | 1.968 |
Traj 2 | 20.552 | 10.198 | 1.556 |
Traj 3 | 19.803 | 12.235 | 5.334 |
Traj 4 | 18.204 | 10.770 | 2.328 |
Traj 5 | 21.698 | 11.112 | 5.359 |
Traj 6 | 21.998 | 12.472 | 4.973 |
Traj 7 | 17.925 | 11.457 | 4.851 |
Traj 8 | 20.392 | 10.606 | 1.943 |
Traj 9 | 14.394 | 6.654 | 0.798 |
Traj 10 | 15.669 | 9.710 | 1.968 |
Fast-UMI 3D Model Display
Hardware (3D Printing and Purchase)Prototype
Top Cover
Gopro Extension Arm
Fingertip for XArm
Gopro Robotic Mount
Mask Piece
T265 Mount V2
Project Leader
Main Contributors
Phd Candidate