Instructions to use ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B")
model = AutoModelForCausalLM.from_pretrained("ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B

SGLang

How to use ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B with Docker Model Runner:
```
docker model run hf.co/ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B
```

AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling

Introduction

Artificial Hippocampus Networks (AHNs) transform lossless memory into fixed-size compressed representations for long-context modeling. Lossless memory (e.g., attention’s key-value (KV) cache) stores exact input information but grows with sequence length, making it inefficient for long sequences. In contrast, compressed memory (e.g., RNNs’ hidden state) maintains a constant size and offers fixed computational costs per input token, but this comes at the cost of information loss. To harness the benefits of both memory types, AHNs continually convert lossless memory outside the sliding attention window into compressed form. AHNs can be instantiated with any RNN-like architectures. The model then integrates both memory types to make predictions across long contexts.

This repository hosts the model weights for AHN. For installation, usage instructions, and further documentation, please visit our GitHub repository.

Method

**(a)** Illustration of the model augmented with Artificial Hippocampus Networks (AHNs). In this example, the sliding window length is 3. When the input sequence length is less than or equal to the window length, the model operates identically to a standard Transformer. For longer sequences, AHNs continually compress the token outside the window into a compact memory representation. The model then utilizes both the lossless information within window, and the compressed memory to generate the next token. **(b)** Self-distillation training framework of AHNs based on an open-weight LLM. During training, the base LLM's weights are frozen, and only the AHNs' parameters are trained.

Model Zoo

base model	AHN module	#params	checkpoint (AHN only)
Qwen2.5-3B-Instruct	Mamba2	11.9M	🤗model
Qwen2.5-3B-Instruct	DeltaNet	11.8M	🤗model
Qwen2.5-3B-Instruct	GatedDeltaNet	13.0M	🤗model
Qwen2.5-7B-Instruct	Mamba2	18.6M	🤗model
Qwen2.5-7B-Instruct	DeltaNet	18.5M	🤗model
Qwen2.5-7B-Instruct	GatedDeltaNet	21.3M	🤗model
Qwen2.5-14B-Instruct	Mamba2	51.4M	🤗model
Qwen2.5-14B-Instruct	DeltaNet	51.1M	🤗model
Qwen2.5-14B-Instruct	GatedDeltaNet	61.0M	🤗model

Evaluation

LV-Eval & InfiniteBench Results

LongBench Results

Contact

Yunhao Fang: yunhao.fang@bytedance.com
Weihao Yu (corresponding author): weihao.yu@bytedance.com

Citation

BibTeX:

@article{fang2025artificial,
  title={Artificial hippocampus networks for efficient long-context modeling},
  author={Fang, Yunhao and Yu, Weihao and Zhong, Shu and Ye, Qinghao and Xiong, Xuehan and Wei, Lai},
  journal={arXiv preprint arXiv:2510.07318},
  year={2025}
}

Downloads last month: 69

Model tree for ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B

Base model

Qwen/Qwen2.5-7B

Finetuned

(962)

this model

Collection including ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B

AHN

Collection

Artificial Hippocampus Networks (AHNs) for Efficient Long-Context Modeling • 9 items • Updated Oct 9, 2025 • 8

Paper for ByteDance-Seed/AHN-DN-for-Qwen-2.5-Instruct-7B

Artificial Hippocampus Networks for Efficient Long-Context Modeling

Paper • 2510.07318 • Published Oct 8, 2025 • 32