Instructions to use mgonzs13/SpaceOm-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use mgonzs13/SpaceOm-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="mgonzs13/SpaceOm-GGUF", filename="SpaceOm-F16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use mgonzs13/SpaceOm-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mgonzs13/SpaceOm-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mgonzs13/SpaceOm-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mgonzs13/SpaceOm-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mgonzs13/SpaceOm-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf mgonzs13/SpaceOm-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf mgonzs13/SpaceOm-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf mgonzs13/SpaceOm-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf mgonzs13/SpaceOm-GGUF:Q4_K_M
Use Docker
docker model run hf.co/mgonzs13/SpaceOm-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use mgonzs13/SpaceOm-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mgonzs13/SpaceOm-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mgonzs13/SpaceOm-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/mgonzs13/SpaceOm-GGUF:Q4_K_M
- Ollama
How to use mgonzs13/SpaceOm-GGUF with Ollama:
ollama run hf.co/mgonzs13/SpaceOm-GGUF:Q4_K_M
- Unsloth Studio new
How to use mgonzs13/SpaceOm-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mgonzs13/SpaceOm-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mgonzs13/SpaceOm-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mgonzs13/SpaceOm-GGUF to start chatting
- Docker Model Runner
How to use mgonzs13/SpaceOm-GGUF with Docker Model Runner:
docker model run hf.co/mgonzs13/SpaceOm-GGUF:Q4_K_M
- Lemonade
How to use mgonzs13/SpaceOm-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull mgonzs13/SpaceOm-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.SpaceOm-GGUF-Q4_K_M
List all available models
lemonade list
SpaceOm
This model is evaluated in the paper SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence. The code for the SpaCE-10 benchmark is available at: https://github.com/Cuzyoung/SpaCE-10.
Model creator: remyxai
Original model: SpaceOm
GGUF quantization: llama.cpp commit 2baf07727f921d9a4a1b63a2eff941e95d0488ed
Description
Model Overview
SpaceOm improves over SpaceThinker by adding:
- the target module
o_projin LoRA fine-tuning - SpaceOm dataset for longer reasoning traces
- Robo2VLM-Reasoning dataset for more robotics domain and MCVQA examples
The choice to include o_proj among the target modules in LoRA finetuning was inspired by the study here, which argues for
the importance of this module in reasoning models.
The reasoning traces in the SpaceThinker dataset average ~200 "thinking" tokens so now we've included longer reasoning traces in the training data to help the model use more tokens in reasoning.
Aiming to improve alignment for robotics applications, we've trained with synthetic reasoning traces, derived from the Robo2VLM-1 dataset.
Model Evaluation
SpatialScore - 3B and 4B models
| Model | Overall | Count. | Obj.-Loc. | Pos.-Rel. | Dist. | Obj.-Prop. | Cam.&IT. | Tracking | Others |
|---|---|---|---|---|---|---|---|---|---|
| SpaceQwen2.5-VL-3B | 42.31 | 45.01 | 49.78 | 57.88 | 27.36 | 34.11 | 26.34 | 26.44 | 43.58 |
| SpatialBot-Phi2-3B | 41.65 | 53.23 | 54.32 | 55.40 | 27.12 | 26.10 | 24.21 | 27.57 | 41.66 |
| Kimi-VL-3B | 51.48 | 49.22 | 61.99 | 61.34 | 38.27 | 46.74 | 33.75 | 56.28 | 47.23 |
| Kimi-VL-3B-Thinking | 52.60 | 52.66 | 58.93 | 63.28 | 39.38 | 42.57 | 32.00 | 46.97 | 42.73 |
| Qwen2.5-VL-3B | 47.90 | 46.62 | 55.55 | 62.23 | 32.39 | 32.97 | 30.66 | 36.90 | 42.19 |
| InternVL2.5-4B | 49.82 | 53.32 | 62.02 | 62.02 | 32.80 | 27.00 | 32.49 | 37.02 | 48.95 |
| SpaceOm (3B) | 49.00 | 56.00 | 54.00 | 65.00 | 41.00 | 50.00 | 36.00 | 42.00 | 47.00 |
See all results for evaluating SpaceOm on the SpatialScore benchmark.
Compared to SpaceQwen, this model outperforms by all categories
And comparing to SpaceThinker:
SpaCE-10 Benchmark Comparison
This table compares SpaceOm evaluated using GPT scoring against several top models from the SpaCE-10 benchmark leaderboard. Top scores in each category are bolded.
| Model | EQ | SQ | SA | OO | OS | EP | FR | SP | Source |
|---|---|---|---|---|---|---|---|---|---|
| SpaceOm | 32.47 | 24.81 | 47.63 | 50.00 | 32.52 | 9.12 | 37.04 | 25.00 | GPT Eval |
| Qwen2.5-VL-7B-Instruct | 32.70 | 31.00 | 41.30 | 32.10 | 27.60 | 15.40 | 26.30 | 27.50 | Table |
| LLaVA-OneVision-7B | 37.40 | 36.20 | 42.90 | 44.20 | 27.10 | 11.20 | 45.60 | 27.20 | Table |
| VILA1.5-7B | 30.20 | 38.60 | 39.90 | 44.10 | 16.50 | 35.10 | 30.10 | 37.60 | Table |
| InternVL2.5-4B | 34.30 | 34.40 | 43.60 | 44.60 | 16.10 | 30.10 | 33.70 | 36.70 | Table |
Legend:
- EQ: Entity Quantification
- SQ: Scene Quantification
- SA: Size Assessment
- OO: Object-Object spatial relations
- OS: Object-Scene spatial relations
- EP: Entity Presence
- FR: Functional Reasoning
- SP: Spatial Planning
ℹ️ Note: Scores for SpaceOm are generated via
gpt_eval_scoreon single-choice (*-single) versions of the SpaCE-10 benchmark tasks. Other entries reflect leaderboard accuracy scores from the official SpaCE-10 evaluation table.
Read more about the SpaCE-10 benchmark
Limitations
- Performance may degrade in cluttered environments or camera perspective.
- This model was fine-tuned using synthetic reasoning over an internet image dataset.
- Multimodal biases inherent to the base model (Qwen2.5-VL) may persist.
- Not intended for use in safety-critical or legal decision-making.
Users are encouraged to evaluate outputs critically and consider fine-tuning for domain-specific safety and performance. Distances estimated using autoregressive transformers may help in higher-order reasoning for planning and behavior but may not be suitable replacements for measurements taken with high-precision sensors, calibrated stereo vision systems, or specialist monocular depth estimation models capable of more accurate, pixel-wise predictions and real-time performance.
Citation
@article{chen2024spatialvlm,
title = {SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities},
author = {Chen, Boyuan and Xu, Zhuo and Kirmani, Sean and Ichter, Brian and Driess, Danny and Florence, Pete and Sadigh, Dorsa and Guibas, Leonidas and Xia, Fei},
journal = {arXiv preprint arXiv:2401.12168},
year = {2024},
url = {https://arxiv.org/abs/2401.12168},
}
@misc{qwen2.5-VL,
title = {Qwen2.5-VL},
url = {https://qwenlm.github.io/blog/qwen2.5-vl/},
author = {Qwen Team},
month = {January},
year = {2025}
}
@misc{vl-thinking2025,
title={SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models },
author={Hardy Chen and Haoqin Tu and Fali Wang and Hui Liu and Xianfeng Tang and Xinya Du and Yuyin Zhou and Cihang Xie},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/UCSC-VLAA/VLAA-Thinking}},
}
@article{wu2025spatialscore,
author = {Wu, Haoning and Huang, Xiao and Chen, Yaohui and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
title = {SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding},
journal = {arXiv preprint arXiv:2505.17012},
year = {2025},
}
@article{gong2025space10,
title = {SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence},
author = {Ziyang Gong and Wenhao Li and Oliver Ma and Songyuan Li and Jiayi Ji and Xue Yang and Gen Luo and Junchi Yan and Rongrong Ji},
journal = {arXiv preprint arXiv:2506.07966},
year = {2025},
url = {https://arxiv.org/abs/2506.07966}
}
- Downloads last month
- 254
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for mgonzs13/SpaceOm-GGUF
Dataset used to train mgonzs13/SpaceOm-GGUF
Papers for mgonzs13/SpaceOm-GGUF
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
Who Reasons in the Large Language Models?
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Evaluation results
- Overall Success Rate on 3DSRBenchself-reported0.542
- Overall Success Rate on 3DSRBenchself-reported0.599
- Overall Success Rate on 3DSRBenchself-reported0.388
- Overall Success Rate on 3DSRBenchself-reported0.583
- Overall Success Rate on 3DSRBenchself-reported0.446
- Overall Success Rate on 3DSRBenchself-reported0.488
- Overall Success Rate on 3DSRBenchself-reported0.611
- Overall Success Rate on 3DSRBenchself-reported0.704