Unique3D
Create a 1M faces 3D colored model from an image!
Create a 1M faces 3D colored model from an image!
Try PaliGemma on document understanding tasks
Generate custom audio clips from text prompts
Annotate and describe images with text prompts
Edit your video with text prompts and style control
Video upscaler/restorer
Annotate videos with object boxes and labels using captions
Generate images from captions or enhance prompts with AI
Generate transcription and summary from uploaded videos
Chat about images by uploading them
Build and run language models visually
Upscale and enhance images with tileβaware AI
In-browser speech recognition w/ word-level timestamps
High-fidelity Virtual Try-on
Video-to-Audio Generation with Hidden Alignment
Multimodal Image-to-Video
Transcribe audio in any language using text data
Generate highβquality images from text prompts
Aesthetically Controllable Text-Driven Stylization w/o Train
Generate lifelike video animations from images and audio
Try on clothes virtually on a photo using diffusion models
Generate enhanced images by blending foreground with custom backgrounds
Generate virtual tryβon images of clothes on a person
Text-to-Video
Answer questions about uploaded images or videos
Turn spoken words into AI chat responses
Convert image text to markdown format
Generate passportβready ID photos from a portrait
Answer questions about uploaded images
Travel through the model latent space
Generate a novel-view video from a single image
Analyse any image with Llama3.2
Fill and edit images using masks
Convert PDFs to individual page images
Generate document retrieval queries from a page image
Answer questions about uploaded images and documents
Transcribe audio or YouTube videos into text
Generate music from text descriptions
Generate audioβready script from documents
Ultra-high resolution image synthesis
Generate or edit realistic audio from text prompts
VLMEvalKit Evaluation Results Collection
Generate personalized research profiles and chat with Arxiv Copilot
Run code and get instant results with Qwen Code Interpreter
High-fidelity Virtual Try-on
Describe image contents with prompts
Visual Retrieval with ColPali and Vespa
Using RAG LLM to assist your academic writing
Generate realistic person images with new clothes or poses