Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.15586

Byte Level Models - Tokenizer-Free Language Models

Byte-level language models for tokenizer-free NLP, multilingual text, and raw byte processing

allenai/Bolmo-1B

Text Generation • 1B • Updated Dec 22, 2025 • 1.24k • 48
allenai/Bolmo-7B

Text Generation • Updated Dec 22, 2025 • 1.62k • 55
allenai/bolmo_mix

Updated Dec 22, 2025 • 475 • 9
Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 17

Representation & Optimization

Understanding about representation sheds light on optimization

about 1 month ago

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1, 2025 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31, 2025 • 1

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 90
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 25
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

Artifacts for the Bolmo release: https://allenai.org/papers/bolmo.

allenai/Bolmo-7B

Text Generation • Updated Dec 22, 2025 • 1.62k • 55
allenai/Bolmo-1B

Text Generation • 1B • Updated Dec 22, 2025 • 1.24k • 48
allenai/bolmo_mix

Updated Dec 22, 2025 • 475 • 9
Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 17

Running on Zero

Featured

684

Di♪♪Rhythm

🎶

684

Blazingly Fast and Embarrassingly Simple Song Generation
Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 17
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

Paper • 2505.10557 • Published May 15, 2025 • 48
salakash/SamKash-Tolstoy

Text Generation • Updated Dec 20, 2025 • 5.24k • 148

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 29
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 15
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 33

Byte Level Models - Tokenizer-Free Language Models

Byte-level language models for tokenizer-free NLP, multilingual text, and raw byte processing

allenai/Bolmo-1B

Text Generation • 1B • Updated Dec 22, 2025 • 1.24k • 48
allenai/Bolmo-7B

Text Generation • Updated Dec 22, 2025 • 1.62k • 55
allenai/bolmo_mix

Updated Dec 22, 2025 • 475 • 9
Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 17

Artifacts for the Bolmo release: https://allenai.org/papers/bolmo.

allenai/Bolmo-7B

Text Generation • Updated Dec 22, 2025 • 1.62k • 55
allenai/Bolmo-1B

Text Generation • 1B • Updated Dec 22, 2025 • 1.24k • 48
allenai/bolmo_mix

Updated Dec 22, 2025 • 475 • 9
Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 17

Representation & Optimization

Understanding about representation sheds light on optimization

about 1 month ago

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1
Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published Apr 1, 2025 • 1
Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1
ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published Mar 31, 2025 • 1

Running on Zero

Featured

684

Di♪♪Rhythm

🎶

684

Blazingly Fast and Embarrassingly Simple Song Generation
Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 17
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning

Paper • 2505.10557 • Published May 15, 2025 • 48
salakash/SamKash-Tolstoy

Text Generation • Updated Dec 20, 2025 • 5.24k • 148

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 90
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 25
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 29
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 15
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 33

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs