Llava Hugging Face

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

nielsr submitted a paper 4 days ago

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

nielsr submitted a paper 15 days ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

nielsr submitted a paper 19 days ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

View all activity

nielsr

submitted a paper to Daily Papers 4 days ago

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Paper • 2603.28130 • Published 8 days ago • 8

nielsr

submitted a paper to Daily Papers 15 days ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Paper • 2603.19209 • Published 19 days ago • 5

nielsr

submitted a paper to Daily Papers 19 days ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

Paper • 2603.14482 • Published 23 days ago • 27

nielsr

submitted a paper to Daily Papers 20 days ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published 21 days ago • 20

nielsr

authored a paper 25 days ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 26 days ago • 64

nielsr

in llava-hf/llava-1.5-7b-hf about 1 month ago

12333

#61 opened about 1 month ago by

LamboLi

nielsr

submitted a paper to Daily Papers about 1 month ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Paper • 2602.17807 • Published Feb 19 • 7

nielsr

submitted a paper to Daily Papers about 2 months ago

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Paper • 2602.11389 • Published Feb 11 • 8

nielsr

submitted a paper to Daily Papers 2 months ago

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

Paper • 2601.17950 • Published Jan 25 • 4

nielsr

submitted a paper to Daily Papers 3 months ago

TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration

Paper • 2601.04544 • Published Jan 8 • 6

ybelkada

authored a paper 3 months ago

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Paper • 2601.04890 • Published Jan 8 • 43

nielsr

submitted a paper to Daily Papers 4 months ago

CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion

Paper • 2512.19535 • Published Dec 22, 2025 • 12

nielsr

in llava-hf/llava-v1.6-mistral-7b-hf 4 months ago

Fix dtype for the language model

#46 opened 4 months ago by

qgallouedec

nielsr

in llava-hf/llava-1.5-7b-hf 4 months ago

Multiple Image Tokens

#60 opened 4 months ago by

rogergheser

RaushanTurganbay

updated 5 models 5 months ago

posted an update 6 months ago

Post

10939

deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages

4 replies

AI & ML interests

Recent Activity

Team members 7

llava-hf's activity

12333

Fix dtype for the language model

Multiple Image Tokens