One Brain, Many Bodies: The Race to Build Robot Foundation Models

For decades, a robot was only ever as smart as the single task it was programmed to repeat. Teaching it something new meant months of hand-coding. In 2026 that assumption is collapsing, thanks to a fast-moving idea the industry now calls "Physical AI": instead of writing a fresh program for every chore, you train one large model that can see, reason and act — then drop it into many different machines.

From chatbots to "vision-language-action"

The breakthrough is a new class of model known as the vision-language-action, or VLA. Where a chatbot turns text into text, a VLA turns camera images and a spoken instruction into motor commands. Google DeepMind's Gemini Robotics, built on its Gemini model, pairs a VLA with a separate "embodied reasoning" system for spatial understanding; the company reported it more than doubled the performance of rival models on a generalisation benchmark, and has since released a version compact enough to run on the robot itself.

NVIDIA is pushing the same idea as an open platform. Its Isaac GR00T, billed as the first open foundation model for humanoid robots, uses a dual system: one part interprets the scene and the instruction, the other generates fluid movement in real time. To feed it, NVIDIA says it generated 780,000 synthetic training runs — the equivalent of about nine months of human demonstrations — in roughly eleven hours of simulation.

The money follows the model

Investors have noticed. Physical Intelligence, a two-year-old San Francisco startup behind the open π0 "generalist policy," raised a reported $600 million at a $5.6 billion valuation and, by early 2026, was said to be in talks for roughly $1 billion more at a valuation above $11 billion, with backers reportedly including Jeff Bezos and OpenAI. The pitch is seductive: a single model that learns to fold laundry, sort parts and clear a table, then transfers those skills to bodies it has never controlled.

The caveats are real. These models still fumble unfamiliar objects, demand enormous compute, and lean heavily on simulated data that may not match a messy kitchen. But the direction of travel is clear. The hard problem in robotics is shifting from building better limbs to building a better brain — and for the first time, that brain looks like something you can download.

One Brain, Many Bodies: The Race to Build Robot Foundation Models

From chatbots to "vision-language-action"

The money follows the model

Sources