🧪 Protein-Ligand Design — LoRA adapter for `poolside/Laguna-XS.2`

poolside Laguna Hackathon submission — Team JAMMY. A LoRA adapter trained with reinforcement learning (GRPO) to make poolside/Laguna-XS.2 reason like a bench computational chemist / protein engineer: measure with tools, then commit an answer.

This is the trained adapter that goes with our environment and dataset:

➡️ Gym / dataset: poolside-laguna-hackathon/protein-ligand-design

The gym hands the model a molecule or protein plus a scientist's question, and the model must call CPU-only cheminformatics/proteomics tools (RDKit + Biopython) to measure the answer before committing. The reward is answer correctness only, and every ground-truth answer is computed by those same tools, so scoring is exact.

What this adapter is


Type	PEFT LoRA adapter (not a merged model)
Base model	`poolside/Laguna-XS.2`
Rank `r`	16
`lora_alpha`	32
`lora_dropout`	0.0
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`, `experts`
Dtype	F32

LoRA is applied to the attention projections and the MoE expert MLPs, which is why the adapter is large (~4.6 GB) despite being rank-16.

Training

Trained on Prime Intellect Hosted Training — shared run dashboard:

Algorithm: GRPO
Reward: binary final-answer correctness (1.0 correct / 0.0 wrong) — using tools is the means, never the reward
Learning rate: 1e-5
Rollouts per example: 16
Batch size: 128
Max tokens: 4096, thinking enabled
Steps: 80 (full run); the final checkpoint is the (tied-)best on eval.

What training changed

On a held-out set of 60 questions, eval accuracy (avg@1) rose from the base model's 91.7% to 96.7% — a real +5-point gain, with the final checkpoint tied for best. Training was stable throughout: reward held around 0.9 (normal GRPO variance, dipping to ~0.65 and recovering), completions stayed ~900 tokens (no ballooning), and there were no truncated or failed rollouts.

A concrete example. One question the base model got wrong before training:

For a kinase backbone-carbonyl halogen bond, is 4-bromo-7-azaindole (Brc1ccc2[nH]ccc2n1) a candidate — it must carry a heavy halogen (Cl/Br/I) for the halogen bond, an aromatic ring for π-stacking, and LogP between 1 and 3? (correct answer: yes)

The base model ran the tools (substructure matches for the halogen and ring, plus descriptors) but committed the wrong verdict. This is the kind of tool-grounded judgement the adapter sharpens.

Honest framing: the base model is already strong on this set (~92%), so the available headroom was modest — the adapter captured most of it.

Choosing the training recipe

The stable hyperparameters above didn't come for free — they're the output of a sweep on a precursor environment, allan/science-gym-bio. The lesson: learning rate is the stability lever (5e-5 peaks then collapses; 1e-5 holds), and larger rollout groups (8 → 16) cut GRPO advantage variance. That recipe — LR 1e-5, 16 rollouts/example, thinking on — is what we carried into the protein-ligand run.

Usage

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "poolside/Laguna-XS.2"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "poolside-laguna-hackathon/protein-ligand-design")

For the full tool-use evaluation loop, install and run the gym:

prime env install jdthewlis/protein-ligand-design
prime eval run jdthewlis/protein-ligand-design -m <your-deployment> -n 20 -r 3

Built by Team JAMMY for the poolside Laguna hackathon. Trained with GRPO on Prime Intellect Hosted Training; environment questions generated with Claude Opus 4.8.

Downloads last month: 38

Model tree for poolside-laguna-hackathon/protein-ligand-design

Base model

poolside/Laguna-XS.2

Adapter

(7)

this model

poolside-laguna-hackathon
/

protein-ligand-design

🧪 Protein-Ligand Design — LoRA adapter for `poolside/Laguna-XS.2`

What this adapter is

Training

What training changed

Choosing the training recipe

Usage

Model tree for poolside-laguna-hackathon/protein-ligand-design

Dataset used to train poolside-laguna-hackathon/protein-ligand-design

🧪 Protein-Ligand Design — LoRA adapter for poolside/Laguna-XS.2

What this adapter is

Training

What training changed

Choosing the training recipe

Usage

Model tree for poolside-laguna-hackathon/protein-ligand-design

Dataset used to train poolside-laguna-hackathon/protein-ligand-design

🧪 Protein-Ligand Design — LoRA adapter for `poolside/Laguna-XS.2`