MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

This repository contains the MM2SG model, a multimodal large vision-language model for scene graph generation, as presented in the paper "MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments" (accepted at CVPR 2025). The model leverages multimodal inputs (including RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data) to generate semantic scene graphs, enabling a more comprehensive understanding of complex operating room scenarios.

Paper: https://arxiv.org/abs/2503.02579

Code: https://github.com/egeozsoy/MM-OR

Authors: Ege Özsoy, Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck, Benjamin Busam, Matthias Keicher, Nassir Navab

MM-OR Dataset

To download MM-OR, first fill out this form https://forms.gle/kj47QXEcraQdGidg6 to get access to the download script. By filling out this form, you agree to the terms of use of the dataset.
You can use the download script, which automatically download the entire dataset consisting of multiple .zip files, and unzippes them. Make sure you have "wget" and "unzip" installed.
Put the newly created MM-OR_data folder into the root directory of this project.
Optionally download the 4D-OR dataset, download and put it to the root directory, and rename it 4D-OR_data. Instructions are in the official repo: https://github.com/egeozsoy/4D-OR. You can also find the newly annotated segmentations annotations and how to configure them in that repository.

Panoptic Segmentation and Scene Graph Generation Instructions

Detailed instructions for Panoptic Segmentation and Scene Graph Generation training and evaluation are available within the respective subdirectories of this repository. Please refer to the README files within panoptic_segmentation and scene_graph_generation for specific instructions and requirements.

@inproceedings{ozsoy2024mmor,
  title={MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High Intensity Surgical Environments},
  author={\textbf{Ege Özsoy} and Pellegrini, Chantal and Czempiel, Tobias and Tristram, Felix and Yuan, Kun and Bani-Harouni, David and Eck, Ulrich and Busam, Benjamin and Keicher, Matthias and Navab, Nassir},
  booktitle={CVPR},
  note={Accepted},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Spaces using egeozsoy/MM-OR 2

Paper for egeozsoy/MM-OR

MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments

Paper • 2503.02579 • Published Mar 4, 2025