Blog | Bee Maps

Everyone's racing to build world models.

NVIDIA released Cosmos.
Wayve just got deals with Uber and Nissan.
Tesla's been training on their incredible fleet data for years.
OpenAI is building their own
And many startups.

The models will keep getting better. That's not the question.

The question is: where does the training data come from?

The Physical World Isn't the Web

LLMs got to where they are because the internet handed them trillions of tokens. Every blog post, every Wikipedia article, every Reddit thread, every book ever scanned — it was all just sitting there, waiting to be ingested.

The physical world doesn't work like that.

There's no "Common Crawl" for driving data. No open source visual archive of every road, every intersection, every weather condition that refreshes each day. The real world isn't visually indexed and refreshed.

This is the bottleneck.

World models will continue to improve. The architecture breakthroughs will keep coming. Compute will keep scaling. Training techniques will get more efficient. None of it matters without data. And unlike the web, physical world data doesn't accumulate passively. It doesn't sit on servers waiting to be scraped. Someone has to go out and collect it. Every single day.

GPUs Are Table Stakes. Data Is the Bottleneck.

NVIDIA announced "Physical AI Open Datasets" with 1,700 hours of driving data. It's a meaningful contribution to the field.

But let's be honest about the scale of what's needed.

1,700 hours is one dashcam running for 70 days. The diversity of conditions, geographies, and edge cases in that dataset is inherently limited. Not because NVIDIA didn't try — but because collecting physical world data at scale is genuinely hard.

This is where Bee Maps fits in. Our global fleet has mapped 37% of the world's roads. We capture vast amounts of real-world driving video — not simulation, not test routes, but actual roads in actual conditions.

More importantly, we're capturing the moments that matter most for training: harsh braking events, swerving, high g-force moments, near-misses. The edge cases that world models need to understand but rarely see in curated datasets.

Video Alone Is Not Enough

Here's where I think most world model training pipelines will hit a wall: they treat driving data as a video problem.

It's not. It's a sensor fusion problem.

A 10-second clip of a car swerving is useful. That same clip with synchronized telemetry — speed at each frame, g-force profile, GPS coordinates, heading — is training data.

World models need to understand physics, not just pixels. They need to learn that a 0.4g lateral acceleration on wet pavement means something different than the same force on dry concrete. They need the ground truth that only comes from real sensors on real roads.

Our AI Event Videos API doesn't just return video clips. Each event includes:

High-resolution GNSS data (position, altitude, timestamp at 30Hz)
IMU sensor streams (accelerometer, gyroscope)
Event classification (harsh braking, swerving, speeding, aggressive acceleration)
Speed profiles throughout the event
Geographic and temporal metadata

This is what supervised learning actually requires. Labeled, sensor-rich, geographically diverse training data at scale.

near miss video event

The Events Video API Is Live

If you're working on world models and need real-world driving data, we built this for you:

POST /aievents/search

Query by event type. Filter by geography. Specify date ranges. Get back video URLs with full sensor context.

GET /aievents/{id}?includeGnssData=true&includeImuData=true

Pull the raw telemetry. Train on real physics.

The model architectures will keep improving. The compute will keep scaling. But the teams that pull ahead will be the ones who solved the data supply chain first.

That's the bottleneck we're working on.

Resources

Bee Maps API Documentation

Get API Key

The World Model Bottleneck Nobody's Talking About

The Physical World Isn't the Web

GPUs Are Table Stakes. Data Is the Bottleneck.

Video Alone Is Not Enough

The Events Video API Is Live

Resources