Text Generation
Transformers
Safetensors
English
qwen2
clinical-nlp
medical-coding
icd10
icd-10-cm
reasoning
reinforcement-learning
grpo
healthcare
conversational
text-generation-inference
Instructions to use DATEXIS/DeepICD-R1-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DATEXIS/DeepICD-R1-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DATEXIS/DeepICD-R1-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DATEXIS/DeepICD-R1-7B") model = AutoModelForCausalLM.from_pretrained("DATEXIS/DeepICD-R1-7B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use DATEXIS/DeepICD-R1-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DATEXIS/DeepICD-R1-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DATEXIS/DeepICD-R1-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DATEXIS/DeepICD-R1-7B
- SGLang
How to use DATEXIS/DeepICD-R1-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DATEXIS/DeepICD-R1-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DATEXIS/DeepICD-R1-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DATEXIS/DeepICD-R1-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DATEXIS/DeepICD-R1-7B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use DATEXIS/DeepICD-R1-7B with Docker Model Runner:
docker model run hf.co/DATEXIS/DeepICD-R1-7B
| language: | |
| - en | |
| license: other | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| tags: | |
| - clinical-nlp | |
| - medical-coding | |
| - icd10 | |
| - icd-10-cm | |
| - reasoning | |
| - reinforcement-learning | |
| - grpo | |
| - healthcare | |
| base_model: | |
| - Qwen/Qwen2.5-7B-Instruct | |
| # DeepICD-R1-7B | |
| ## Model Summary | |
| **DeepICD-R1-7B** is a clinical reasoning language model for **ICD-10-CM diagnosis outcome prediction from admission notes**. | |
| It is derived from **Qwen2.5-7B-Instruct** and trained using the **DeepICD-R1 framework**, which combines structured reasoning traces with reinforcement learning and hierarchical reward signals. | |
| The model is designed to predict a **single ICD-10-CM diagnosis code** from clinical text while producing an interpretable reasoning trace explaining the decision. | |
| The training methodology follows the approach described in the paper: | |
| **DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation** | |
| This work frames clinical diagnosis prediction as a **reasoning task optimized through reinforcement learning**. | |
| --- | |
| # Model Details | |
| - **Model name:** DeepICD-R1-7B | |
| - **Organization:** DATEXIS | |
| - **Base model:** Qwen2.5-7B-Instruct | |
| - **Parameters:** ~7B | |
| - **Task:** Single ICD-10-CM diagnosis prediction from admission notes | |
| - **Training paradigm:** Supervised reasoning + reinforcement learning | |
| - **Framework:** VERL RL trainer | |
| - **Domain:** Clinical NLP / healthcare reasoning | |
| The Qwen2.5-7B-Instruct architecture is a **7-billion-parameter instruction-tuned language model designed for instruction following and long-form generation tasks**. :contentReference[oaicite:1]{index=1} | |
| --- | |
| # Intended Use | |
| This model is intended for **research purposes**, including: | |
| - clinical reasoning research | |
| - ICD-10-CM coding prediction | |
| - reinforcement learning for language models | |
| - reasoning trace generation | |
| - structured prediction from clinical text | |
| ### Out-of-Scope Use | |
| This model **must not be used for**: | |
| - medical diagnosis | |
| - clinical decision support | |
| - patient triage | |
| - automated medical coding without expert supervision | |
| - billing or compliance workflows | |
| --- | |
| # Training Methodology | |
| The **DeepICD-R1 framework** treats diagnosis prediction as a reasoning problem. | |
| Training combines: | |
| ### 1. Supervised reasoning traces | |
| A dataset of reasoning chains explaining diagnosis predictions. | |
| ### 2. Reinforcement learning optimization | |
| Training uses **Group Relative Policy Optimization (GRPO)** to improve reasoning and prediction accuracy. | |
| ### 3. Hierarchical reward signals | |
| Rewards are aligned with the hierarchical structure of ICD codes. | |
| The reward function combines: | |
| - **format reward** — correct reasoning + diagnosis structure | |
| - **outcome reward** — correct diagnosis prediction | |
| - **hierarchical reward** — partial credit for correct ICD prefixes | |
| This design encourages models to produce both **accurate diagnoses and structured reasoning**. | |
| --- | |
| # Training Data | |
| The training task uses **clinical admission notes paired with ICD-10-CM diagnosis codes**, derived from de-identified electronic health record datasets such as **MIMIC-IV**. | |
| Task formulation: | |
| **Input** | |
| Clinical admission note describing patient presentation. | |
| **Output** | |
| Structured reasoning trace and predicted ICD-10-CM code. | |
| --- | |
| # Output Format | |
| The model is trained to produce structured outputs separating reasoning from the final diagnosis. | |
| ### Example | |
| ```text | |
| <think> | |
| The patient presents with ... | |
| Symptoms and clinical history suggest ... | |
| ... | |
| </think> | |
| <diagnosis> | |
| M5116 | |
| </diagnosis> | |
| ``` | |
| ## Training Configuration | |
| The model was trained using the **VERL reinforcement learning trainer** with **Group Relative Policy Optimization (GRPO)**, following the DeepICD-R1 training framework. | |
| ### Core Training Parameters | |
| | Parameter | Value | | |
| |-----------|------| | |
| | Algorithm | GRPO | | |
| | Training framework | VERL (`verl.trainer.main_ppo`) | | |
| | Base model | Qwen2.5-7B-Instruct | | |
| | Training batch size | 64 | | |
| | PPO mini batch size | 64 | | |
| | PPO micro batch size per GPU | 16 | | |
| | Learning rate | 1e-6 | | |
| | LR warmup steps | 80 | | |
| | Total epochs | 1 | | |
| | Max prompt length | 2048 tokens | | |
| | Max response length | 1024 tokens | | |
| ### Rollout / Generation Settings | |
| | Parameter | Value | | |
| |-----------|------| | |
| | Rollout engine | vLLM | | |
| | Samples per prompt (`n`) | 8 | | |
| | Temperature | 0.9 | | |
| | Top-k | disabled | | |
| | dtype | bfloat16 | | |
| | Tensor parallel size | 1 | | |
| | GPU memory utilization | 0.4 | | |
| ### Optimization Details | |
| | Parameter | Value | | |
| |-----------|------| | |
| | Entropy coefficient | 0.001 | | |
| | KL controller coefficient | 0.001 | | |
| | KL loss | disabled | | |
| | Gradient checkpointing | enabled | | |
| | Torch compile | enabled | | |
| | FSDP param offload | disabled | | |
| | FSDP optimizer offload | disabled | | |
| ### Hardware | |
| | Component | Value | | |
| |-----------|------| | |
| | GPUs | 4 | | |
| | Nodes | 1 | | |
| | Precision | bfloat16 | | |
| ### Reward Function | |
| Training uses a **custom batched reward function** combining several reward signals: | |
| - **Outcome reward** — correct ICD-10 prediction | |
| - **Format reward** — correct `<think>` and `<diagnosis>` structure | |
| - **Hierarchical reward** — partial credit for ICD prefix matches | |
| - **Reasoning reward** — encourages meaningful reasoning traces | |
| - **LLM-based reward** — optional external judge scoring | |
| These rewards align the model toward producing **both accurate diagnoses and structured reasoning traces**. | |
| The reasoning trace provides transparency into how the diagnosis was derived from the clinical note. | |
| --- | |
| ## Evaluation | |
| Evaluation follows the methodology described in the **DeepICD-R1 paper**. | |
| Performance is measured using **macro-averaged F1 scores** at multiple levels of the ICD hierarchy. | |
| | Level | Description | | |
| |------|-------------| | |
| | Chapter | Broad ICD category | | |
| | Category | First three digits | | |
| | Full code | Complete ICD-10 code | | |
| Hierarchical evaluation allows partial credit when the model predicts the correct high-level diagnostic category even if the full code is incorrect. | |
| --- | |
| ## Limitations | |
| Models following the **DeepICD-R1 framework** share several limitations. | |
| ### Dataset limitations | |
| - Training data consists primarily of **English clinical notes** | |
| - Distribution reflects **hospital-specific patient populations** | |
| - ICD labels are **highly imbalanced**, affecting rare diagnoses | |
| ### Model limitations | |
| - Reasoning traces may appear convincing while being incorrect | |
| - Predictions may fail for rare or long-tail diagnoses | |
| - Models may demonstrate **premature diagnostic closure** | |
| - Reinforcement learning rewards are only proxies for expert feedback | |
| --- | |
| ## Ethical Considerations | |
| This model is trained on **de-identified clinical data** and intended strictly for research. | |
| ### Potential risks | |
| - propagation of dataset biases | |
| - overconfidence in generated reasoning | |
| - misuse in clinical decision making | |
| ### Appropriate safeguards | |
| - expert oversight | |
| - dataset bias evaluation | |
| - fairness audits | |
| - controlled deployment environments | |
| --- | |
| ## Hardware and Training Setup | |
| Typical training configuration for models in this family includes: | |
| - **GPUs:** multi-GPU training (4–8 GPUs) | |
| - **Precision:** bfloat16 | |
| - **Rollout engine:** vLLM | |
| - **Training framework:** VERL PPO / GRPO trainer | |
| - **Sampling:** multiple rollouts per prompt | |
| --- | |
| ## Usage | |
| ### Transformers Example | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| model_id = "DATEXIS/DeepICD-R1-7B" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| device_map="auto", | |
| torch_dtype="auto" | |
| ) | |
| prompt = """ | |
| You are a clinical reasoning model. | |
| Given the following admission note, | |
| produce reasoning in <think> tags | |
| and a final ICD-10 diagnosis in <diagnosis> tags. | |
| [ADMISSION NOTE] | |
| """ | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=512 | |
| ) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ## Recommended Inference Practices | |
| - Use prompts consistent with the training format. | |
| - Validate predicted ICD-10 codes against official code formats. | |
| - Always review predictions with medical experts. | |
| - Avoid exposing reasoning traces in safety-critical settings without verification. | |
| --- | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @inproceedings{roehr2026deepicdr1, | |
| title={DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation}, | |
| author={R{\"o}hr, Tom and Steffek, Thomas and Teucher, Roman and Bressem, Keno and others}, | |
| booktitle={Proceedings of LREC-COLING}, | |
| year={2026} | |
| } | |