Instructions to use egeozsoy/MM-OR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use egeozsoy/MM-OR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="egeozsoy/MM-OR")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("egeozsoy/MM-OR", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use egeozsoy/MM-OR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "egeozsoy/MM-OR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "egeozsoy/MM-OR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/egeozsoy/MM-OR
- SGLang
How to use egeozsoy/MM-OR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "egeozsoy/MM-OR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "egeozsoy/MM-OR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "egeozsoy/MM-OR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "egeozsoy/MM-OR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use egeozsoy/MM-OR with Docker Model Runner:
docker model run hf.co/egeozsoy/MM-OR
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
This repository contains the MM2SG model, a multimodal large vision-language model for scene graph generation, as presented in the paper "MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments" (accepted at CVPR 2025). The model leverages multimodal inputs (including RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data) to generate semantic scene graphs, enabling a more comprehensive understanding of complex operating room scenarios.
Paper: https://arxiv.org/abs/2503.02579
Code: https://github.com/egeozsoy/MM-OR
Authors: Ege Özsoy, Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, Nassir Navab
MM-OR Dataset
- To download MM-OR, first fill out this form https://forms.gle/kj47QXEcraQdGidg6 to get access to the download script. By filling out this form, you agree to the terms of use of the dataset.
- You can use the download script, which automatically download the entire dataset consisting of multiple .zip files, and unzippes them. Make sure you have "wget" and "unzip" installed.
- Put the newly created MM-OR_data folder into the root directory of this project.
- Optionally download the 4D-OR dataset, download and put it to the root directory, and rename it 4D-OR_data. Instructions are in the official repo: https://github.com/egeozsoy/4D-OR. You can also find the newly annotated segmentations annotations and how to configure them in that repository.
Panoptic Segmentation and Scene Graph Generation Instructions
Detailed instructions for Panoptic Segmentation and Scene Graph Generation training and evaluation are available within the respective subdirectories of this repository. Please refer to the README files within panoptic_segmentation and scene_graph_generation for specific instructions and requirements.
@inproceedings{ozsoy2024mmor,
title={MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High Intensity Surgical Environments},
author={\textbf{Ege Özsoy} and Pellegrini, Chantal and Czempiel, Tobias and Tristram, Felix and Yuan, Kun and Bani-Harouni, David and Eck, Ulrich and Busam, Benjamin and Keicher, Matthias and Navab, Nassir},
booktitle={CVPR},
note={Accepted},
year={2025}
}