Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO

This model repository hosts pretrained PyTorch actor weights for the diagnostic study "Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO".

The weights correspond to controlled PPO experiments on LunarLander-v2. They are provided to reproduce the qualitative behaviors discussed in the paper: a single-horizon baseline, differentiable temporal routing, error-based temporal routing, and Target Decoupling.

This model repository is a weight distribution package. Training scripts and selected generated figures live in the GitHub code repository; paper text and source files are distributed through arXiv.

Model Weights Overview

The repository provides four standalone .pth actor weight files:

1_baseline.pth (Baseline PPO): single-horizon PPO reference policy.
2_surrogate_hacking_attention.pth (Differentiable Routing): policy from the actor-side attention routing diagnostic.
3_temporal_paradox_variance.pth (Error-Based Routing): policy from the gradient-free error-based routing diagnostic.
4_target_decoupling_final.pth (Target Decoupling): policy trained with structural separation between the actor objective and temporal routing. The actor uses the long-horizon advantage, while auxiliary critic heads remain as regularizers during training.

Target Decoupling is described in the paper as a structural isolation principle in the LunarLander-v2 PPO setting. The reported evidence concerns removal of the actor-side routing pathway and improved observed worst-seed return in the tested run set, not broad benchmark superiority.

Usage

For training scripts and selected diagnostic plots, see the GitHub repository. The manuscript itself is distributed through arXiv rather than duplicated as source files in the code or model repositories.

The published weights contain actor parameters and can be loaded into the same MLP actor architecture used by the training scripts:

import torch
import torch.nn as nn
import numpy as np
import gymnasium as gym
from huggingface_hub import hf_hub_download

weight_path = hf_hub_download(
    repo_id="ben-dlwlrma/Representation-Over-Routing",
    filename="4_target_decoupling_final.pth",
)

def layer_init(layer, std=np.sqrt(2), bias_const=0.0):
    nn.init.orthogonal_(layer.weight, std)
    nn.init.constant_(layer.bias, bias_const)
    return layer

actor = nn.Sequential(
    layer_init(nn.Linear(8, 64)),
    nn.Tanh(),
    layer_init(nn.Linear(64, 64)),
    nn.Tanh(),
    layer_init(nn.Linear(64, 4), std=0.01),
)

actor.load_state_dict(torch.load(weight_path, weights_only=True))
actor.eval()

env = gym.make("LunarLander-v2")
state, _ = env.reset()
done = False

while not done:
    state_tensor = torch.FloatTensor(state).unsqueeze(0)
    with torch.no_grad():
        logits = actor(state_tensor)
        action = torch.argmax(logits, dim=1).item()

    state, reward, terminated, truncated, _ = env.step(action)
    done = terminated or truncated

The paper experiments were conducted on LunarLander-v2. The hosted demo may use LunarLander-v3 for compatibility with current Gymnasium releases while preserving the same actor architecture and weight format.

Citation

@misc{sunRepresentationRoutingDiagnosing2026,
  title = {Representation over {{Routing}}: {{Diagnosing Temporal Routing Pathologies}} in {{Multi-Timescale PPO}}},
  shorttitle = {Representation over {{Routing}}},
  author = {Sun, Jing},
  year = 2026,
  publisher = {arXiv},
  doi = {10.48550/ARXIV.2604.13517},
  urldate = {2026-04-16},
  copyright = {Creative Commons Attribution 4.0 International},
  keywords = {Artificial Intelligence (cs.AI),FOS: Computer and information sciences,Machine Learning (cs.LG)}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Space using ben-dlwlrma/Representation-Over-Routing 1

Paper for ben-dlwlrma/Representation-Over-Routing

Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO

Paper • 2604.13517 • Published 11 days ago • 5

ben-dlwlrma
/

Representation-Over-Routing