Paper List

OpenVLA系列工作#

OpenVLA: An Open-Source Vision-Language-Action Model

具身操作VLA foundation model

2024-06

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

具身操作VLA foundation model

2025-02-01

RDT系列工作#

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

双臂协同操作foundation model

2024-10

H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

面向更加数据高效的双臂协同操作foundation model

2024-10

RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data

基于UMI数据进一步泛化RDT能力以及零样本泛化能力。

2025-09

TikTok GR系列工作#

UNLEASHING LARGE-SCALE VIDEO GENERATIVE PRE-TRAINING FOR VISUAL ROBOT MANIPULATION

字节跳动提出的基于大规模视频预训练模型

2023-12

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

字节跳动提出的基于大规模视频预训练模型

2024-10

GR-3 Technical Report

字节跳动提出的基于大规模视频预训练模型

2025-07-01

Google-Research系列工作#

RT-1: Robotics Transformer for Real-World Control at Scale

RT系列VLA关键工作

2022-12

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

RT系列VLA关键工作

2023-07

Self-Improving Embodied Foundation Models

自进化基础模型达成数据飞轮。

2025-09

PaLM-E系列工作#

PaLM-E: An Embodied Multimodal Language Model

PaLM-E系列关键工作

2023-03

Meta-AI系列工作#

R3M: A Universal Visual Representation for Robot Manipulation

Meta-AI系列关键工作

2022-03

π系列工作#

π0: A Vision-Language-Action Flow Model for General Robot Control

PI系列VLA关键工作

2024-10

π0.5: a Vision-Language-Action Model with Open-World Generalization

PI系列VLA关键工作

2024-10

Being-Beyond系列工作#

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

通过现有的大规模数据，构建具身操作的foundation model

2025-07

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

通过现有的大规模数据，构建具身操作的foundation model

2025-03

Agibot系列工作#

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

Agibot系列关键工作

2025-03

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Agibot系列关键工作

2025-08

Octo系列工作#

Octo: An Open-Source Generalist Robot Policy

Octo系列关键工作

2024-05

Embodied-R1 Series#

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

参照R1的训练方法，进行Embodied Reasoning

2025-08

星海图系列工作#

Galaxea Open-World Dataset & G0 Dual-System VLA Model

星海图首个双系统VLA模型和开源数据集

2025-08

自变量机器人系列工作#

Igniting VLMs toward the Embodied Space

自变量提出的推理-动作一体化模型。

2025-09

1X系列工作#

1X World Model: Evaluating Bits, not Atoms

1X系列世界模型

2025-08

NVIDIA GR00T系列工作#

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

NVIDIA GR00T系列首个工作，实现了大小脑模型架构

2025-03

GR00T N1.5: An Improved Open Foundation Model for Generalist Humanoid Robots

基于GR00T的构建，实现了在更加泛化的人形机器人上进行训练。

2025-06

Last updated: Aug.21 2025

OpenVLA系列工作#

OpenVLA: An Open-Source Vision-Language-Action Model

Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success

RDT系列工作#

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

RDT2: Enabling Zero-Shot Cross-Embodiment Generalization by Scaling Up UMI Data

TikTok GR系列工作#

UNLEASHING LARGE-SCALE VIDEO GENERATIVE PRE-TRAINING FOR VISUAL ROBOT MANIPULATION

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

GR-3 Technical Report

Google-Research系列工作#

RT-1: Robotics Transformer for Real-World Control at Scale

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Self-Improving Embodied Foundation Models

PaLM-E系列工作#

PaLM-E: An Embodied Multimodal Language Model

Meta-AI系列工作#

R3M: A Universal Visual Representation for Robot Manipulation

π系列工作#

π0: A Vision-Language-Action Flow Model for General Robot Control

π0.5: a Vision-Language-Action Model with Open-World Generalization

Being-Beyond系列工作#

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Agibot系列工作#

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Octo系列工作#

Octo: An Open-Source Generalist Robot Policy

Embodied-R1 Series#

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

星海图系列工作#

Galaxea Open-World Dataset & G0 Dual-System VLA Model

自变量机器人系列工作#

Igniting VLMs toward the Embodied Space

1X系列工作#

1X World Model: Evaluating Bits, not Atoms

NVIDIA GR00T系列工作#

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

GR00T N1.5: An Improved Open Foundation Model for Generalist Humanoid Robots

DOCS

Above All

Manipulation Foundation Model

World Model

Learning from Human