Protein-ligand interaction header

πŸ§ͺ Protein-Ligand Design β€” LoRA adapter for poolside/Laguna-XS.2

poolside Laguna Hackathon submission β€” Team JAMMY. A LoRA adapter trained with reinforcement learning (GRPO) to make poolside/Laguna-XS.2 reason like a bench computational chemist / protein engineer: measure with tools, then commit an answer.

This is the trained adapter that goes with our environment and dataset:

➑️ Gym / dataset: poolside-laguna-hackathon/protein-ligand-design

The gym hands the model a molecule or protein plus a scientist's question, and the model must call CPU-only cheminformatics/proteomics tools (RDKit + Biopython) to measure the answer before committing. The reward is answer correctness only, and every ground-truth answer is computed by those same tools, so scoring is exact.

What this adapter is

Type PEFT LoRA adapter (not a merged model)
Base model poolside/Laguna-XS.2
Rank r 16
lora_alpha 32
lora_dropout 0.0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, experts
Dtype F32

LoRA is applied to the attention projections and the MoE expert MLPs, which is why the adapter is large (~4.6 GB) despite being rank-16.

Training

Trained on Prime Intellect Hosted Training β€” shared run dashboard:

  • Algorithm: GRPO
  • Reward: binary final-answer correctness (1.0 correct / 0.0 wrong) β€” using tools is the means, never the reward
  • Learning rate: 1e-5
  • Rollouts per example: 16
  • Batch size: 128
  • Max tokens: 4096, thinking enabled
  • Steps: 80 (full run); the final checkpoint is the (tied-)best on eval.

What training changed

On a held-out set of 60 questions, eval accuracy (avg@1) rose from the base model's 91.7% to 96.7% β€” a real +5-point gain, with the final checkpoint tied for best. Training was stable throughout: reward held around 0.9 (normal GRPO variance, dipping to ~0.65 and recovering), completions stayed ~900 tokens (no ballooning), and there were no truncated or failed rollouts.

held-out eval curve

A concrete example. One question the base model got wrong before training:

For a kinase backbone-carbonyl halogen bond, is 4-bromo-7-azaindole (Brc1ccc2[nH]ccc2n1) a candidate β€” it must carry a heavy halogen (Cl/Br/I) for the halogen bond, an aromatic ring for Ο€-stacking, and LogP between 1 and 3? (correct answer: yes)

The base model ran the tools (substructure matches for the halogen and ring, plus descriptors) but committed the wrong verdict. This is the kind of tool-grounded judgement the adapter sharpens.

Honest framing: the base model is already strong on this set (~92%), so the available headroom was modest β€” the adapter captured most of it.

Choosing the training recipe

The stable hyperparameters above didn't come for free β€” they're the output of a sweep on a precursor environment, allan/science-gym-bio. The lesson: learning rate is the stability lever (5e-5 peaks then collapses; 1e-5 holds), and larger rollout groups (8 β†’ 16) cut GRPO advantage variance. That recipe β€” LR 1e-5, 16 rollouts/example, thinking on β€” is what we carried into the protein-ligand run.

science-gym-bio hyperparameter sweep

Usage

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = "poolside/Laguna-XS.2"
tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "poolside-laguna-hackathon/protein-ligand-design")

For the full tool-use evaluation loop, install and run the gym:

prime env install jdthewlis/protein-ligand-design
prime eval run jdthewlis/protein-ligand-design -m <your-deployment> -n 20 -r 3

Built by Team JAMMY for the poolside Laguna hackathon. Trained with GRPO on Prime Intellect Hosted Training; environment questions generated with Claude Opus 4.8.

Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for poolside-laguna-hackathon/protein-ligand-design

Adapter
(7)
this model

Dataset used to train poolside-laguna-hackathon/protein-ligand-design