Fast-UMI

A Scalable and Hardware-Independent Universal Manipulation Interface



Physical prototypes of Fast-UMI system




Project Overview

Collecting real-world manipulation trajectory data involving robotic arms is essential for developing general-purpose action policies in robotic manipulation, yet such data remains scarce. Existing methods face limitations such as high costs, labor intensity, hardware dependencies, and complex setup requirements involving SLAM algorithms. In this work, we introduce Fast-UMI, an interface-mediated manipulation system comprising two key components: a handheld device operated by humans for data collection and a robot-mounted device used during policy inference. Our approach employs a decoupled design compatible with a wide range of grippers while maintaining consistent observation perspectives, allowing models trained on handheld-collected data to be directly applied to real robots. By directly obtaining the end-effector pose using existing commercial hardware products, we eliminate the need for complex SLAM deployment and calibration, streamlining data processing. Fast-UMI provides supporting software tools for efficient robot learning data collection and conversion, facilitating rapid, plug-and-play functionality. This system offers an efficient and user-friendly tool for robotic learning data acquisition.



Contributions



Dataset Visualization

We have collected 10,000 data samples across 20 tasks and have uploaded the dataset to Hugging Face.

Demos

A naive ACT algorithm is employed here to demonstrate the effectiveness of the 10K training samples we collected, as well as the overall performance of our data collection system.  

Note: The performance of the robot in the video is related to our adopted ACT algorithm, and has no much connection to our FastUMI system.







RealSense T265 Trajectory Accuracy

We evaluated the RealSense T265 trajectory accuracy compared to our motion capture (MoCap) ground truth data.
Trajectory Accuracy Statistics in 'Pick Cup' Task
Trajectory ID Max Error (mm) Mean Error (mm) Min Error (mm)
Traj 1 15.669 9.710 1.968
Traj 2 20.552 10.198 1.556
Traj 3 19.803 12.235 5.334
Traj 4 18.204 10.770 2.328
Traj 5 21.698 11.112 5.359
Traj 6 21.998 12.472 4.973
Traj 7 17.925 11.457 4.851
Traj 8 20.392 10.606 1.943
Traj 9 14.394 6.654 0.798
Traj 10 15.669 9.710 1.968


Fast-UMI 3D Model Display

Hardware (3D Printing and Purchase)

Prototype

Top Cover

Gopro Extension Arm

Fingertip for XArm

Gopro Robotic Mount

Mask Piece

T265 Mount V2



Project Leader

Dr. Yan Ding

Dr. Yan Ding

Researcher at Shanghai AI Lab

yding25@binghamton.edu

Dr. Bin Zhao

Dr. Bin Zhao

Research Scientist at Shanghai AI Lab

bin@nwpu.edu.cn

Main Contributors

Zhaxizhuoma

Zhaxizhuoma

Intern

Ziniu Wu

Ziniu Wu

Intern

Kehui Liu

Kehui Liu

Phd Candidate

Xin Liu

Xin Liu

Intern

Chuyue Guan

Chuyue Guan

Intern

Zhongjie Jia

Zhongjie Jia

Phd Candidate

Tianyu Wang

Tianyu Wang

Intern

Shuai Liang

Shuai Liang

Phd Candidate

Pingrui Zhang

Pengan Chen

Intern

Pingrui Zhang

Pingrui Zhang

Phd Candidate



BibTeX

				
	@article{wu2024fast,
		title={Fast-UMI: A Scalable and Hardware-Independent Universal Manipulation Interface},
		author={Wu, Ziniu and Wang, Tianyu and Guan, Chuyue and Jia, Zhongjie and Liang, Shuai and Song, Haoming and Qu, Delin and Wang, Dong and Wang, Zhigang and Cao, Nieqing and others},
		journal={arXiv preprint arXiv:2409.19499},
		year={2024}
	}