Nested Learning

A new paradigm that treats ML models as interconnected optimization problems, enabling continual learning without catastrophic forgetting.

Research Preview

Nested Learning was introduced at NeurIPS 2025 by Google Research. This is cutting-edge research that is not yet widely adopted in production systems.

What is Nested Learning?

Nested Learning is a new paradigm that views machine learning models not as one continuous learning process, but as a system of interconnected, multi-level optimization problems that are optimized simultaneously at different speeds.

The key insight is that a model's architecture (layers, modules) and its optimization procedure (the learning algorithm) need not be separate concerns—they can be seen as different levels of a unified learning system.

The Problem: Catastrophic Forgetting

Current LLMs face a fundamental limitation: they cannot truly "learn" after training. When you try to teach a model new information, it tends to forget what it previously knew—a phenomenon called catastrophic forgetting.

Training on Task B degrades performance on Task A
Fine-tuning often "breaks" general capabilities
Knowledge is frozen at pre-training time

Core Insight: Architecture = Optimization

Traditional deep learning treats model architecture and the learning algorithm as separate. Nested Learning unifies them:

1
Each "layer" is itself a learning problem with its own context and update rule
2
Different components update at different timescales (like biological brains)
3
The hierarchy of update frequencies creates "levels" of learning

The Nested Structure

Outer Loop

Slow updates — consolidates long-term knowledge

Middle Loop

Medium updates — learns recurring patterns

Inner Loop

Fast updates — adapts to immediate context

By separating learning into multiple timescales, each level can focus on different aspects of the task without interfering with others.

Traditional vs Nested Learning

Traditional Learning

Single optimization loop. Learning Task B overwrites Task A knowledge. No timescale separation.

Single loop

Nested Learning

Multi-level optimization. Each level preserves different types of knowledge. Natural continual learning.

Slow
Medium
Fast

The Hope Architecture

Google Research introduced Hope, a proof-of-concept architecture that implements nested learning principles:

Features

  • 1Self-modifying learning module that can learn its own update rules
  • 2Continuum Memory System (CMS) for extended context handling
  • 3Unbounded in-context learning through self-referential optimization

Results

  • Outperforms Transformers on language modeling (lower perplexity)
  • Better common-sense reasoning accuracy
  • Superior performance on long-context Needle-In-Haystack tasks

Why This Matters

If validated at scale, Nested Learning could fundamentally change how we build and deploy AI systems:

Continual Learning

Models that can learn from deployment without full retraining

Efficiency

Potentially much more efficient than current architectures

Biological Alignment

Closer to how biological brains actually learn and remember

Key Takeaways

  • 1Nested Learning treats models as multi-level optimization problems, not single continuous processes
  • 2It addresses catastrophic forgetting by separating learning into different timescales
  • 3The Hope architecture shows promising results on language modeling and reasoning tasks
  • 4This is active research (NeurIPS 2025)—not yet production-ready, but worth watching