We build training data
for frontier models.

We work directly with research labs and AI companies on bespoke data problems. Our team includes frontier lab alumni and PhDs who understand what your models actually need.


What we do

Most training data vendors sell volume. We sell specificity. If you need a thousand carefully constructed examples of multi-step mechanical reasoning, or video demonstrations of robotic manipulation tasks with motion-capture ground truth, we're built for that.

Training data

Long-tail and domain-specific datasets across text, image, video, and motion capture. We focus on the data that's hardest to get right: multimodal, expert-level, and constructed for your specific training objectives.

Evaluations

Custom eval suites designed by people who've built them at frontier labs. There's a real difference between evals that look good on a leaderboard and evals that actually tell you something about your model.

RL environments

Purpose-built reinforcement learning environments for post-training. Reward modeling, preference data, and RLHF/RLAIF pipelines designed to spec, not from a template.


Team

Xiwen

CEO

MechE PhD. 20+ years in manufacturing and robotics. Bridges the gap between physical systems and the data needed to model them.

Mike

CTO

Frontier lab research engineer. RL and post-training on modern multimodal foundation models.

Thomas

Head of Sales

Former senior account manager at Salesforce (99th percentile). Over a decade in enterprise sales.


Get in touch