FastUMI

A Scalable and Hardware-Independent Universal Manipulation Interface and Dataset

Arxiv Dataset Data Collection Code ACT / DP Code TBA Hardware (3D Printing and Purchase) Hardware Installation Instruction Data Collection Instruction WeChat Discord FastUMI 2.0 TBA

Physical prototypes of FastUMI system

Project Overview

Real-world manipulation data involving robotic arms is crucial for developing generalist action policies, yet such data remains scarce since existing data collection methods are hindered by high costs, hardware dependencies, and complex setup requirements. In this work, we introduce FastUMI, a substantial redesign of the Universal Manipulation Interface (UMI) system that addresses these challenges by enabling rapid deployment, simplifying hardware–software integration, and delivering robust performance in real-world data acquisition.

Contributions

1) FastUMI adopts a decoupled hardware design and incorporates extensive mechanical modifications, removing dependencies on specialized robotic components while preserving consistent observation perspectives.
2) FastUMI refines the algorithmic pipeline by replacing complex Visual-Inertial Odometry (VIO) implementations with an off-the-shelf tracking module, significantly reducing deployment complexity while maintaining accuracy.
3) FastUMI includes an ecosystem for data collection, verification, and integration with both established and newly developed imitation learning algorithms, accelerating policy learning advancement.
4) We have open-sourced a high-quality dataset of over 10,000 real-world demonstration trajectories spanning 22 everyday tasks, forming one of the most diverse UMI-like datasets to date.

Dataset Visualization

We have collected 10,000 data samples across 20 tasks and have uploaded the dataset to Hugging Face.

Original ACT Demos

A naive ACT algorithm is employed here to demonstrate the effectiveness of the 10K training samples we collected, as well as the overall performance of our data collection system.

Note: The performance of the robot in the video is related to our adopted ACT algorithm, and has no much connection to our FastUMI system.

Original Diffusion Policy Demos

A naive DP algorithm is employed here to demonstrate the effectiveness of the 10K training samples we collected, as well as the overall performance of our data collection system.

Depth-Enhanced Diffusion Policy Demos

An Depth-Enhanced DP algorithm is employed here to demonstrate the effectiveness of the 10K training samples we collected, as well as the overall performance of our data collection system.

RealSense T265 Trajectory Accuracy

We evaluated the RealSense T265 trajectory accuracy compared to our motion capture (MoCap) ground truth data.

Trajectory Accuracy Statistics in 'Pick Cup' Task
Trajectory ID	Max Error (mm)	Mean Error (mm)	Min Error (mm)
Traj 1	15.669	9.710	1.968
Traj 2	20.552	10.198	1.556
Traj 3	19.803	12.235	5.334
Traj 4	18.204	10.770	2.328
Traj 5	21.698	11.112	5.359
Traj 6	21.998	12.472	4.973
Traj 7	17.925	11.457	4.851
Traj 8	20.392	10.606	1.943
Traj 9	14.394	6.654	0.798
Traj 10	15.669	9.710	1.968

FastUMI 3D Model Display

Hardware (3D Printing and Purchase)

Prototype

Top Cover

Gopro Extension Arm

Fingertip for XArm

Gopro Robotic Mount

Mask Piece

T265 Mount V2

FastUMI 2.0 Updates

In FastUMI 2.0, we have integrated a tactile module into the fingertip. We selected the 3D-ViTac solution (https://arxiv.org/abs/2410.24091) for several reasons, such as the convenience of cutting it into various sizes and its ability to handle severe deformation of the fingertip. Please note that the original 3D-ViTac cannot be used directly, so we have implemented extensive optimizations. Additionally, to address the discontinuation of the T265, we have adopted the RoboBaton Mini as a replacement (https://www.hessian-matrix.com/). We have successfully completed the manufacturing of the prototype.