Salesforce/wikitext
Viewer • Updated • 3.71M • 1.33M • 684
6-layer DeepSeek-V3 with Multihead Latent Attention (MLA) trained for research on shared subspaces in Transformer attention mechanisms.
This model is part of the shared-subspaces research project investigating the impact of shared output latent spaces in Transformer attention mechanisms.
import torch
from transformers import DeepseekV3ForCausalLM, AutoTokenizer
# Load model and tokenizer
model = DeepseekV3ForCausalLM.from_pretrained("ChrisMcCormick/deepseek-tiny-v0.1")
tokenizer = AutoTokenizer.from_pretrained("ChrisMcCormick/deepseek-tiny-v0.1")
# Generate text
inputs = tokenizer("The future of AI is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
@misc{mccormick2025sharedsubspaces,
title={Shared Subspaces in Transformer Attention: Investigating Output Latent Spaces},
author={McCormick, Chris},
year={2025},
howpublished={\url{https://github.com/chrisjmccormick/shared-subspaces}}
}
Apache 2.0