Instructions to use poolside-laguna-hackathon/protein-ligand-design with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use poolside-laguna-hackathon/protein-ligand-design with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("poolside/Laguna-XS.2") model = PeftModel.from_pretrained(base_model, "poolside-laguna-hackathon/protein-ligand-design") - Notebooks
- Google Colab
- Kaggle
π§ͺ Protein-Ligand Design β LoRA adapter for poolside/Laguna-XS.2
poolside Laguna Hackathon submission β Team JAMMY. A LoRA adapter trained with reinforcement learning (GRPO) to make
poolside/Laguna-XS.2reason like a bench computational chemist / protein engineer: measure with tools, then commit an answer.
This is the trained adapter that goes with our environment and dataset:
β‘οΈ Gym / dataset: poolside-laguna-hackathon/protein-ligand-design
The gym hands the model a molecule or protein plus a scientist's question, and the model must call CPU-only cheminformatics/proteomics tools (RDKit + Biopython) to measure the answer before committing. The reward is answer correctness only, and every ground-truth answer is computed by those same tools, so scoring is exact.
What this adapter is
| Type | PEFT LoRA adapter (not a merged model) |
| Base model | poolside/Laguna-XS.2 |
Rank r |
16 |
lora_alpha |
32 |
lora_dropout |
0.0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, experts |
| Dtype | F32 |
LoRA is applied to the attention projections and the MoE expert MLPs, which is why the adapter is large (~4.6 GB) despite being rank-16.
Training
Trained on Prime Intellect Hosted Training β shared run dashboard:
- Algorithm: GRPO
- Reward: binary final-answer correctness (1.0 correct / 0.0 wrong) β using tools is the means, never the reward
- Learning rate: 1e-5
- Rollouts per example: 16
- Batch size: 128
- Max tokens: 4096, thinking enabled
- Steps: 80 (full run); the final checkpoint is the (tied-)best on eval.
What training changed
On a held-out set of 60 questions, eval accuracy (avg@1) rose from the base
model's 91.7% to 96.7% β a real +5-point gain, with the final
checkpoint tied for best. Training was stable throughout: reward held around 0.9
(normal GRPO variance, dipping to ~0.65 and recovering), completions stayed ~900
tokens (no ballooning), and there were no truncated or failed rollouts.
A concrete example. One question the base model got wrong before training:
For a kinase backbone-carbonyl halogen bond, is 4-bromo-7-azaindole (
Brc1ccc2[nH]ccc2n1) a candidate β it must carry a heavy halogen (Cl/Br/I) for the halogen bond, an aromatic ring for Ο-stacking, and LogP between 1 and 3? (correct answer: yes)
The base model ran the tools (substructure matches for the halogen and ring, plus descriptors) but committed the wrong verdict. This is the kind of tool-grounded judgement the adapter sharpens.
Honest framing: the base model is already strong on this set (~92%), so the available headroom was modest β the adapter captured most of it.
Choosing the training recipe
The stable hyperparameters above didn't come for free β they're the output of a
sweep on a precursor environment, allan/science-gym-bio. The lesson: learning
rate is the stability lever (5e-5 peaks then collapses; 1e-5 holds), and larger
rollout groups (8 β 16) cut GRPO advantage variance. That recipe β LR 1e-5,
16 rollouts/example, thinking on β is what we carried into the protein-ligand run.
Usage
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "poolside/Laguna-XS.2"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "poolside-laguna-hackathon/protein-ligand-design")
For the full tool-use evaluation loop, install and run the gym:
prime env install jdthewlis/protein-ligand-design
prime eval run jdthewlis/protein-ligand-design -m <your-deployment> -n 20 -r 3
Built by Team JAMMY for the poolside Laguna hackathon. Trained with GRPO on Prime Intellect Hosted Training; environment questions generated with Claude Opus 4.8.
- Downloads last month
- 38
Model tree for poolside-laguna-hackathon/protein-ligand-design
Base model
poolside/Laguna-XS.2

