Paper List
A curated collection of influential research papers in robotics, computer vision, and machine learning
OpenVLA系列工作#
OpenVLA: An Open-Source Vision-Language-Action Model
具身操作VLA foundation model
2024-06

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
具身操作VLA foundation model
2025-02-01

RDT系列工作#
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
双臂协同操作foundation model
2024-10

H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation
面向更加数据高效的双臂协同操作foundation model
2024-10

RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data
基于UMI数据进一步泛化RDT能力以及零样本泛化能力。
2025-09

TikTok GR系列工作#
UNLEASHING LARGE-SCALE VIDEO GENERATIVE PRE-TRAINING FOR VISUAL ROBOT MANIPULATION
字节跳动提出的基于大规模视频预训练模型
2023-12

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
字节跳动提出的基于大规模视频预训练模型
2024-10

GR-3 Technical Report
字节跳动提出的基于大规模视频预训练模型
2025-07-01

Google-Research系列工作#
RT-1: Robotics Transformer for Real-World Control at Scale
RT系列VLA关键工作
2022-12

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
RT系列VLA关键工作
2023-07

Self-Improving Embodied Foundation Models
自进化基础模型达成数据飞轮。
2025-09

PaLM-E系列工作#
PaLM-E: An Embodied Multimodal Language Model
PaLM-E系列关键工作
2023-03

Meta-AI系列工作#
R3M: A Universal Visual Representation for Robot Manipulation
Meta-AI系列关键工作
2022-03

π系列工作#
π0: A Vision-Language-Action Flow Model for General Robot Control
PI系列VLA关键工作
2024-10

π0.5: a Vision-Language-Action Model with Open-World Generalization
PI系列VLA关键工作
2024-10

Being-Beyond系列工作#
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
通过现有的大规模数据,构建具身操作的foundation model
2025-07

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills
通过现有的大规模数据,构建具身操作的foundation model
2025-03

Agibot系列工作#
AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
Agibot系列关键工作
2025-03

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
Agibot系列关键工作
2025-08

Octo系列工作#
Octo: An Open-Source Generalist Robot Policy
Octo系列关键工作
2024-05

Embodied-R1 Series#
Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation
参照R1的训练方法,进行Embodied Reasoning
2025-08

星海图系列工作#
Galaxea Open-World Dataset & G0 Dual-System VLA Model
星海图首个双系统VLA模型和开源数据集
2025-08

自变量机器人系列工作#
Igniting VLMs toward the Embodied Space
自变量提出的推理-动作一体化模型。
2025-09

1X系列工作#
1X World Model: Evaluating Bits, not Atoms
1X系列世界模型
2025-08

NVIDIA GR00T系列工作#
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
NVIDIA GR00T系列首个工作,实现了大小脑模型架构
2025-03

GR00T N1.5: An Improved Open Foundation Model for Generalist Humanoid Robots
基于GR00T的构建,实现了在更加泛化的人形机器人上进行训练。
2025-06

Last updated: Aug.21 2025