Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.15619

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

Paper • 2603.08262 • Published 29 days ago • 42
On-Policy Context Distillation for Language Models

Paper • 2602.12275 • Published Feb 12 • 3
Online Experiential Learning for Language Models

Paper • 2603.16856 • Published 21 days ago • 57
Mixture-of-Depths Attention

Paper • 2603.15619 • Published 22 days ago • 79

The Last Prism • Corpus

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Attention Residuals

Paper • 2603.15031 • Published 22 days ago • 176
Mixture-of-Depths Attention

Paper • 2603.15619 • Published 22 days ago • 79
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Paper • 2603.15557 • Published 22 days ago • 28

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published Jan 21 • 21
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Paper • 2511.18373 • Published Nov 23, 2025 • 7
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published Nov 17, 2025 • 19
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24, 2025 • 29
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 134

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 25
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 182
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29

Mixture-of-Depths Attention

Paper • 2603.15619 • Published 22 days ago • 79

about 8 hours ago

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 26
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published Jan 24 • 34
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Paper • 2602.00747 • Published Jan 31 • 9

YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection

Paper • 2512.23273 • Published Dec 29, 2025 • 15
A 58-Addition, Rank-23 Scheme for General 3x3 Matrix Multiplication

Paper • 2512.21980 • Published Dec 26, 2025 • 3
Step-DeepResearch Technical Report

Paper • 2512.20491 • Published Dec 23, 2025 • 87
SAM Audio: Segment Anything in Audio

Paper • 2512.18099 • Published Dec 19, 2025 • 24

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 108
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 154
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 29
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 15
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 33

FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use

Paper • 2603.08262 • Published 29 days ago • 42
On-Policy Context Distillation for Language Models

Paper • 2602.12275 • Published Feb 12 • 3
Online Experiential Learning for Language Models

Paper • 2603.16856 • Published 21 days ago • 57
Mixture-of-Depths Attention

Paper • 2603.15619 • Published 22 days ago • 79

Mixture-of-Depths Attention

Paper • 2603.15619 • Published 22 days ago • 79

The Last Prism • Corpus

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
Attention Residuals

Paper • 2603.15031 • Published 22 days ago • 176
Mixture-of-Depths Attention

Paper • 2603.15619 • Published 22 days ago • 79
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Paper • 2603.15557 • Published 22 days ago • 28

about 8 hours ago

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 26
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published Jan 24 • 34
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 22
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Paper • 2602.00747 • Published Jan 31 • 9

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published Jan 21 • 21
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection

Paper • 2512.23273 • Published Dec 29, 2025 • 15
A 58-Addition, Rank-23 Scheme for General 3x3 Matrix Multiplication

Paper • 2512.21980 • Published Dec 26, 2025 • 3
Step-DeepResearch Technical Report

Paper • 2512.20491 • Published Dec 23, 2025 • 87
SAM Audio: Segment Anything in Audio

Paper • 2512.18099 • Published Dec 19, 2025 • 24

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Paper • 2511.18373 • Published Nov 23, 2025 • 7
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published Nov 17, 2025 • 19
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24, 2025 • 29
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 134

Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 108
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7, 2025 • 154
Autoregressive Diffusion Models

Paper • 2110.02037 • Published Oct 5, 2021
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 25
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 182
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 29
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24, 2024 • 15
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 33

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs