Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer
Paper • 2603.19227 • Published • 42
Feeling and building the multimodal intelligence.
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling