The mapping trilemma
If you spend time thinking about how maps get made — really thinking about it, from first principles — you run into a problem that we think is underappreciated by most people in the industry.
There are basically three ways to crowdsource map data at scale: smartphones, consumer dashcams, and automotive sensors embedded in modern vehicles. Each approach has a genuine strength. And each one forces a deep, structural compromise that no amount of software cleverness can fully overcome.
Smartphones are everywhere. Billions of them. As a distribution mechanism, nothing comes close. But smartphone cameras were designed to take beautiful photos of people and food, not to read a speed limit sign from 50 meters away at highway speed. The GPS receivers are cheap and imprecise. The lenses vary wildly across devices, which means your computer vision models have to cope with an enormous range of distortion, color, and contrast characteristics. And perhaps most importantly, the user experience is terrible — people have to mount their phone, open an app, and actively choose to map every time they get in the car. The novelty wears off. Every smartphone mapping project we're aware of has eventually lost momentum for this reason.
Consumer dashcams solve the passivity problem — mount it once and forget about it. But dashcam manufacturers are building for a consumer who wants to record accidents for insurance purposes. They're optimizing for price, not for mapping-grade positioning or onboard compute. The GPS is basic. There's no stereo depth. There's certainly no neural processing unit running object detection on every frame. You get video, but you don't get understanding.
Automotive ADAS modules are actually quite impressive from a hardware perspective. Modern cars have capable cameras, good positioning, and real compute power behind the rearview mirror. The problem is access. Automakers have been reluctant to share imagery, and the bandwidth available through in-car telematics is too constrained to upload rich visual data. So you get detections — "there is a speed limit sign somewhere around here" — but without the imagery to verify, audit, or train better models. It's a firehose of assertions with no evidence attached.
This is what we think of as the mapping trilemma: rich imagery, consistent data quality, and scalable coverage. With any of these three approaches, you get two at best. The third is structurally out of reach.
We decided to break the trilemma. And the only way to do that, we concluded, was to build our own hardware.
Why hardware matters more than people think
There's a line from Alan Kay that we come back to often:
"People who are really serious about software should make their own hardware." — Alan Kay
We think most technologists treat this as a nice aphorism, but we took it as an engineering directive.
The reason is subtle but important. When you're trying to extract precise, reliable, real-time spatial intelligence from a moving vehicle, every component in the system interacts with every other component. The quality of your GPS antenna affects the accuracy of your object positioning. The baseline distance between your stereo cameras determines the precision of your depth estimation. The synchronization between your GNSS clock, your camera shutter, and your IMU samples determines whether your sensor fusion is coherent or noisy. These aren't independent subsystems you can optimize separately — they're a tightly coupled system, and the overall quality is determined by the weakest link.
When you build on someone else's hardware — a smartphone, a commodity dashcam — you inherit their design tradeoffs, which were made for entirely different reasons. A smartphone antenna is tiny because the phone needs to fit in your pocket. A dashcam GPS module is cheap because the manufacturer needs to hit a $49 price point. These constraints are perfectly rational for those products. They're just incompatible with building great maps.
So we designed the Bee from scratch, selecting every component for one purpose: turning the act of driving into high-quality, real-time map data.

The device, in detail
We want to be specific here, because we think the details matter and because vague claims about "advanced technology" are a red flag in this industry.
| Specification | Details |
|---|---|
| Main camera | Sony IMX577, 12.3 MP, 4K (3840 x 2160) at 30 fps |
| Field of view | 142° horizontal, 68° vertical |
| Stereo depth cameras | 1280 x 800, 81° HFOV, 13 cm baseline |
| GNSS | Dual-band L1/L5 (u-blox F10N), lane-level positioning |
| GNSS antenna | 25 x 25 x 8mm active antenna, 70 x 70mm ground plane |
| GNSS rate | 10 Hz, hardware-synced to camera and IMU |
| IMU | 6-axis (3-axis accelerometer + 3-axis gyroscope), up to 100 Hz |
| On-device AI | 5.1 TOPS NPU, ~100 road feature types detected in real time |
| Privacy | On-device face and license plate blurring before upload |
| Storage | 64 GB internal flash |
| Connectivity | LTE (always-on) + WiFi (2.4 GHz) |
| Ports | Two USB-C (power and data) |
| Video bitrate | 1.5 Mbps to 5 Mbps |
| Recording | Up to 10 hours continuous |
The Bee's main camera uses a Sony IMX577 sensor — 12.3 megapixels, recording 4K video (3840 x 2160) at 30 frames per second. The horizontal field of view is 142 degrees, the vertical is 68 degrees. This is substantially wider than what you'd find in a typical dashcam or smartphone, and it means the Bee captures the full road environment — signage on both sides of the road, lane markings, buildings, infrastructure — in a single pass.
Alongside the main camera sit two stereo depth cameras with a 13-centimeter baseline, each at 1280 x 800 resolution with an 81-degree field of view. By comparing the slight differences between what each camera sees, the system calculates a depth map of the entire scene. The 13-centimeter physical separation between the cameras is critical. Smartphones technically have multiple lenses, but they're squeezed together so tightly that meaningful stereo depth estimation is impractical. You need physical distance between the sensors. There's no software shortcut around this.
For positioning, we use a dual-band L1/L5 u-blox F10N GNSS module paired with a large active antenna — 25 x 25 x 8mm on a 70 x 70mm ground plane. We want to emphasize the antenna, because it's one of the most underappreciated factors in positional accuracy. No smartphone manufacturer would allocate this much physical space to a GPS antenna. No dashcam manufacturer would either. But antenna size directly determines signal quality, and signal quality directly determines whether your positioning is accurate to 10 meters or to sub-meter. The system runs at 10 Hz, matched to the camera frame rate, with hardware-level timestamp synchronization between GNSS fixes, camera frames, and IMU samples. This isn't software-estimated synchronization. It's a hardware PPS (pulse-per-second) signal that keeps everything in lockstep.
The IMU is a 6-axis unit — three-axis accelerometer plus three-axis gyroscope — capable of output rates from 1 to 100 Hz. This matters for sensor fusion: when the GPS signal degrades in an urban canyon or a tunnel, the IMU and visual odometry maintain positional continuity.
For compute, the Bee carries a 5.1 TOPS neural processing unit that runs real-time object detection and classification on every frame. That's not a batch process that happens later in the cloud. It's happening continuously, on the device, as the vehicle drives. The system detects and precisely positions approximately 100 types of road features: speed limit signs, stop signs, traffic lights, turn restrictions, fire hydrants, lane lines, vulnerable road users, and more.
The device has 64 GB of internal flash storage, enough for several days of continuous driving. Data is retained on-device until successfully uploaded and acknowledged by our cloud infrastructure. LTE provides always-on connectivity for small, time-critical payloads — structured detections, location tracks. WiFi handles the larger batched uploads — full-resolution frames, depth data, video clips. Two USB-C ports provide power and data.
One more thing worth highlighting: privacy blurring happens on the device, before any data is uploaded. Faces and license plates are irreversibly blurred as part of the onboard processing pipeline. This isn't a policy choice applied downstream — it's an architectural decision baked into the hardware and firmware. We think this is the right way to handle privacy in any system that captures street-level imagery at scale.
The intelligence pipeline
The hardware specifications above are necessary but not sufficient. What makes the Bee genuinely different, we think, is the end-to-end intelligence pipeline that runs on every device.
The Bee captures 4K video at 30 fps continuously while powered. It's doing several things at once: detecting road features and classifying them, estimating depth through stereo vision, fusing GPS and IMU data for precise positioning, and deciding what needs to be uploaded immediately versus what can wait.
This last point — data prioritization — is more important than it might sound. A naive system would upload everything and let the cloud sort it out. But that approach scales terribly. Cellular bandwidth is expensive, cloud storage and compute costs grow linearly with data volume, and most of the raw imagery is redundant for mapping purposes. You don't need a thousand images of the same stop sign. You need to know the stop sign exists, where exactly it is, and whether it has changed.
So the Bee makes intelligent decisions about what matters. Structured detections — "there is a 35 mph speed limit sign at coordinates X, Y, Z with confidence 0.97" — go out over LTE in near-real-time. Larger payloads like full-resolution imagery and video clips are delivered over LTE or WiFi depending on urgency and payload size. The result is that map consumers get actionable intelligence quickly, without the system drowning in redundant data.
What you can build with this data
All of this ultimately flows into a set of REST APIs that are designed for programmatic consumption — by humans and by AI agents.
Street-Level Imagery returns geolocated dashcam frames with timestamps, GPS accuracy, and sensor metadata for any area you query. Map Features returns detected road objects with precise positions, confidence scores, and associated imagery — speed limits, stop signs, turn restrictions, fire hydrants, lane lines. AI Event Videos returns clips of driving events like harsh braking, swerving, or speeding, each with synchronized GNSS and IMU data. Burst Locations lets you request fresh data collection for specific areas — draw a polygon, our drivers get notified, and you get updated coverage.
Every endpoint accepts standard GeoJSON geometries. The APIs are live, documented, and interactive at docs.beemaps.com.
We also provide an MCP server — a Model Context Protocol endpoint — that exposes all of these APIs as tools that AI agents can invoke directly. If you're building an autonomous workflow that needs to reason about the physical world — verify what's at a location, check road conditions along a route, find driving incidents in an area — the agent can query our data programmatically without any custom integration code.
An open platform for edge AI
There's one more capability that we think represents a genuinely new category of infrastructure, and it's one we're particularly excited about.
Through the Bee Edge AI Platform, external partners can deploy their own AI models directly onto Bee devices via over-the-air updates. These are Python modules that run alongside the native Map AI stack. Partners can target their deployments geographically — by country, state, metro area, or city — and choose what data they want back: structured JSON, imagery, imagery with depth, video, or raw telemetry.
The implication is significant. The Bee fleet has already mapped 36% of the world's roads across more than 100 countries. If you have a custom detection model — say you want to identify a specific type of infrastructure, or classify road surface conditions, or detect something entirely novel — you can deploy it across this existing fleet without building, shipping, or maintaining a single piece of hardware. The fleet becomes a programmable sensor network.
We think this is the direction the industry is heading. The expensive part of physical-world AI isn't the models — it's the sensors, the positioning, the connectivity, the fleet operations. We've built all of that. The Bee Edge AI Platform lets others build on top of it.
What this adds up to
We've tried to be specific rather than vague throughout this post, because we think the details are where the real story is. So let me close with the honest framing of what we've built and why.
The mapping industry has operated for decades under the assumption that there's an inherent tradeoff between data quality and collection scale. Dedicated mapping vehicles produce excellent data but are prohibitively expensive to scale. Crowdsourced approaches scale well but produce inconsistent or impoverished data. This tradeoff has been treated as a law of nature.
We don't think it is. We think it's an artifact of using hardware that was designed for other purposes — phones designed for communication, dashcams designed for insurance footage, cars designed for transportation. When you design hardware specifically for the task of mapping — with the right cameras, the right positioning, the right compute, the right connectivity, all tightly integrated and synchronized — the tradeoff dissolves.
The Bee is that hardware. 36% of the world's roads. 100+ countries. Real-time edge AI. Privacy by design. And APIs built for an era where both humans and AI agents need programmatic access to the physical world.
Explore the APIs: docs.beemaps.com
Get your API key: beemaps.com/developers
Follow us on X or Try Bee Maps for Free.

