The Sovereign Path for Continual Learning: Early Results on Harvey LAB with NVIDIA Nemotron

Jerry Chan

•

May 29, 2026

Field Report

Software is moving away from intelligence that is frozen in time to intelligence that learns and continuously improves. Continual learning makes it possible to optimize models, harnesses, and prompts to improve them as they run.

Legal is one of the areas we're most excited to work on, and one of the hardest. It spans a wide range of behaviors and skillsets - domain knowledge, research, analysis, financial math, precise citations – across many diverse practice areas. It is also among the most demanding environments for data sovereignty, where the work is privileged and the standards for handling it are absolute.

We’re partnering with Harvey to explore these questions on their newly released LAB benchmark, an open-source benchmark for legal agents that contains 1200 agent tasks across 24 practice areas.

Provenance and Sovereign AI

Continual learning is powerful, but in regulated industries it runs into a hard set of requirements. Before applications can put an agent into high-stakes work environments, they have to answer a number of questions: where it was developed, where it runs, what architecture it was built on, and how interpretable its outputs are.

Open-weight models help meet those conditions. With open weight models, agent work never has to leave a data boundary (or region) beyond what the user controls, as the model itself can be hosted inside the firm's own secure cloud environment. And because nothing is hidden behind an external API, the system becomes auditable in a way closed models cannot match: full visibility into the model's reasoning traces, tool calls, and intermediate decisions. Post-training on an open base turns an agent from a black box into something a firm can inspect, govern, and stand behind.

At Trajectory, our goal is to create the platform for continual learning, and that means building our platform with the questions above at its core. One way we’ve done this is by building our platform to be model agnostic. The learning layer is decoupled from any single base model, which means as data flows in, you choose the model you want to improve. Open weights when sovereignty and provenance are non-negotiable.

That design and the landscape of enterprise is what makes a model like NVIDIA Nemotron 3 so exciting to us. Nemotron 3 Super is a 120-billion-parameter open model with roughly 12 billion active per forward pass, a hybrid Mamba-Transformer mixture-of-experts built for agentic workloads, with context up to a million tokens. On top of the compelling architecture, NVIDIA release it in the open: open weights, open data, and an open recipe. That is a base you can actually own, audit, and continually improve inside a customer's boundary. It is exactly the kind of model the interchangeable stack was built to take advantage of.

This post is our first look at post-training Nemotron 3 Super for legal work.

Post-training Nemotron 3 Super Already Approaches the Frontier

On Trajectory's platform, post-training lifts Nemotron 3 Super substantially to performance that approaches and, on some measures, matches today's leading closed frontier models. The gains hold across both the all-pass rate, where an agent must complete a task perfectly or be marked wrong (close to the bar real legal work sets), and the more granular rubric-pass criteria.

Trajectory’s Nemotron 3 Super model matches GPT 5.5 after post-training

Additionally, it sees a +25% lift on rubric pass criteria

Post-Training is Pareto Optimal

However, overall accuracy isn’t the only measure that matters. As we move to a many model future, what makes a model “good,” becomes a function of its cost, speed, and ability to learn as well.

Due to Nemotron 3 Super’s size and post-trainability, we’re able to surpass the current Performance vs. Cost frontier on Harvey’s LAB benchmark.

Nemotron 3 Super approaches frontier performance at a fraction of the cost

Nemotron 3 Super’s post-trainability allows Trajectory turn tasks that pass ~30% of the time to ~90% of the time

What This Looks Like in Practice

After post-training, Nemotron 3 Super gets markedly better at the behaviors that matter most for a legal agent: spotting issues, recommending solutions, explaining its reasoning, and citing sources accurately. Some practice areas gain more than others, with Banking & Finance and Environmental/ESG showing especially strong growth.

Trajectory Nemotron 3 Super becomes markedly better in a few areas crucial for any Legal agent - spotting issues, recommending solutions, explaining those solutions, and citing references.

Additionally, certain tasks see more gains than others. Banking & Finance and Environmental ESG see particularly strong growth.

In Summary

Consumers of intelligence are increasingly shifting to deployments that allow them to own their intelligence and keep improving it over time. Harvey is making this possible for legal with LAB, and open models are what make it achievable.

Models like Nemotron 3 Super set the bar for enterprise-ready open agents: auditable weights, real security, and clear provenance: the properties that let a firm host frontier intelligence inside its own boundary, govern it, and improve it without ever giving up control of its data. That is what opens the door to putting frontier intelligence into regulated work and improving it where the work happens.

We are excited to keep building, and even more excited to try Nemotron 3 Ultra when it arrives, as it pushes the frontier of open source even further.

No.

Multi-LoRA Training for Continual Learning

No.

The Pioneers of Continual Learning

Field Notes

Docs

Careers

Get In Touch

Book A Demo